[QUIZ] Parsing JSON (#155)


R

Ruby Quiz

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

There has been a lot of talk recently about parsing with Ruby. We're seeing
some parser generator libraries pop up that make the task that much easier and
they've been stirring up interest.

In honor of that, this week's Ruby Quiz is to write a parser for JSON.

JSON turns out to turns out to be a great little example for writing parsers for
two reasons. First, it's pretty easy stuff. You can hand-roll a JSON parser in
under 100 lines of Ruby. The second advantage is that the data format is
wonderfully documented:

http://json.org/

Since JSON is just a data format and Ruby supports all of the data types, I vote
we just use Ruby itself as the abstract syntax tree produced by the parse.

Feel free to show off your favorite parser generator, if you don't want to roll
your own. Anything goes.

Here are a few tests to get you started:

require "test/unit"

class TestJSONParser < Test::Unit::TestCase
def setup
@parser = JSONParser.new
end

def test_keyword_parsing
assert_equal(true, @parser.parse("true"))
assert_equal(false, @parser.parse("false"))
assert_equal(nil, @parser.parse("null"))
end

def test_number_parsing
assert_equal(42, @parser.parse("42"))
assert_equal(-13, @parser.parse("-13"))
assert_equal(3.1415, @parser.parse("3.1415"))
assert_equal(-0.01, @parser.parse("-0.01"))

assert_equal(0.2e1, @parser.parse("0.2e1"))
assert_equal(0.2e+1, @parser.parse("0.2e+1"))
assert_equal(0.2e-1, @parser.parse("0.2e-1"))
assert_equal(0.2E1, @parser.parse("0.2e1"))
end

def test_string_parsing
assert_equal(String.new, @parser.parse(%Q{""}))
assert_equal("JSON", @parser.parse(%Q{"JSON"}))

assert_equal( %Q{nested "quotes"},
@parser.parse('"nested \"quotes\""') )
assert_equal("\n", @parser.parse(%Q{"\\n"}))
assert_equal( "a",
@parser.parse(%Q{"\\u#{"%04X" % ?a}"}) )
end

def test_array_parsing
assert_equal(Array.new, @parser.parse(%Q{[]}))
assert_equal( ["JSON", 3.1415, true],
@parser.parse(%Q{["JSON", 3.1415, true]}) )
assert_equal([1, [2, [3]]], @parser.parse(%Q{[1, [2, [3]]]}))
end

def test_object_parsing
assert_equal(Hash.new, @parser.parse(%Q{{}}))
assert_equal( {"JSON" => 3.1415, "data" => true},
@parser.parse(%Q{{"JSON": 3.1415, "data": true}}) )
assert_equal( { "Array" => [1, 2, 3],
"Object" => {"nested" => "objects"} },
@parser.parse(<<-END_OBJECT) )
{"Array": [1, 2, 3], "Object": {"nested": "objects"}}
END_OBJECT
end

def test_parse_errors
assert_raise(RuntimeError) { @parser.parse("{") }
assert_raise(RuntimeError) { @parser.parse(%q{{"key": true false}}) }

assert_raise(RuntimeError) { @parser.parse("[") }
assert_raise(RuntimeError) { @parser.parse("[1,,2]") }

assert_raise(RuntimeError) { @parser.parse(%Q{"}) }
assert_raise(RuntimeError) { @parser.parse(%Q{"\\i"}) }

assert_raise(RuntimeError) { @parser.parse("$1,000") }
assert_raise(RuntimeError) { @parser.parse("1_000") }
assert_raise(RuntimeError) { @parser.parse("1K") }

assert_raise(RuntimeError) { @parser.parse("unknown") }
end
end
 
Ad

Advertisements

E

Eric Mahurin

[Note: parts of this message were removed to make it a legal post.]

In honor of that, this week's Ruby Quiz is to write a parser for JSON.

JSON turns out to turns out to be a great little example for writing
parsers for
two reasons. First, it's pretty easy stuff. You can hand-roll a JSON
parser in
under 100 lines of Ruby. The second advantage is that the data format is
wonderfully documented:

http://json.org/

I definitely want to find time to do this one. What would be nice to have
is performance benchmark to compare parsers. Maybe just have a little ruby
script that generates a stream of repeatable random (but valid) JSON.

Eric
 
J

James Gray

I definitely want to find time to do this one. What would be nice
to have
is performance benchmark to compare parsers. Maybe just have a
little ruby
script that generates a stream of repeatable random (but valid) JSON.

Neat idea.

Just FYI though, I'm probably going to focus more on the parsing in
the summary that the speed.

James Edward Gray II
 
E

Eric Mahurin

[Note: parts of this message were removed to make it a legal post.]

Neat idea.

Just FYI though, I'm probably going to focus more on the parsing in
the summary that the speed.

James Edward Gray II


OK. Once I figure out exactly what JSON is, I'll probably make a random
JSON generator.

Hopefully this will make me do another release of my parser package, which
is long overdue :). At least check-in my local code into CVS.
 
T

Trans

A bit aside, but it seems a good place to plug the thought: JSON is so
close to valid Ruby syntax. It would be great if Ruby could support
the syntax 100%. Then a parse would be as simple as,

data = eval(json)

Or, safety levels withstanding, we could conceive a safe_eval(json).

T.
 
J

James Gray

A bit aside, but it seems a good place to plug the thought: JSON is so
close to valid Ruby syntax. It would be great if Ruby could support
the syntax 100%.

Conversion is pretty easy and definitely one way to solve this quiz.

James Edward Gray II
 
Ad

Advertisements

P

Petite Abeille

A bit aside, but it seems a good place to plug the thought: JSON is so
close to valid Ruby syntax. It would be great if Ruby could support
the syntax 100%. Then a parse would be as simple as,

data = eval(json)

Brilliant!

But perhaps the other way around: bridge the JSON syntax discrepencies
to valid Ruby syntax, e.g:

eval( to_ruby( json ) )

Cheers,

PA.
 
A

ara howard

Maybe just have a little ruby
script that generates a stream of repeatable random (but valid) JSON.

cfp2:~ > cat a.rb
require 'rubygems'
require 'json'

def random_json
case rand
when 0 ... 1/3.0
top = Hash.new
add = lambda{|obj| top[obj] = obj}
when 1/3.0 ... 2/3.0
top = Array.new
add = lambda{|obj| top.push obj}
when 2/3.0 .. 1
top = String.new
add = lambda{|obj| top += obj}
end
10.times{ add[rand.to_s] }
top.to_json
end

puts random_json


cfp2:~ > for i in `seq 1 3`;do ruby a.rb ;done
"0.3786779826911330.2475380034343990.7052927081471540.2056530009384740.1367079874315110.6433874613518640.5329060341883540.8932613322492760.9233991888762390.561470121133217
"
{"0.758942077040095":"0.758942077040095","0.740998718448961":"0.740998718448961","0.581975309640819":"0.581975309640819","0.471066491788047":"0.471066491788047","0.150752108985123":"0.150752108985123","0.679712508205116":"0.679712508205116","0.265444532310993":"0.265444532310993","0.43229805237576":"0.43229805237576","0.880407977937905":"0.880407977937905","0.91896885679168":"0.91896885679168"}
["0.140526101058637
","0.647296447390116
","0.419874655921874
","0.67320818546074
","0.847043108967541
","0.479385904117001
","0.378678170026127
","0.707315391952609","0.26064520446906","0.460184583302929"]

a @ http://codeforpeople.com/
 
T

tho_mica_l

A bit aside, but it seems a good place to plug the thought: JSON is so
close to valid Ruby syntax. It would be great if Ruby could support
the syntax 100%.

I hoped ruby19 would already support this use of colons as in {"key":
"value"} but unfortunately not.
 
Ad

Advertisements

S

Sebastian Hungerecker

tho_mica_l said:
I hoped ruby19 would already support this use of colons as in {"key":
"value"} but unfortunately not.

It's key: value=> {:key => "value"}

HTH,
Sebastian
 
T

tho_mica_l

It's key: value>> {key: "value"}
=> {:key => "value"}

I see. Thanks for pointing this out.

Still some conversion needed for the quiz.

BTW, the ruby19 json library, which Ara mentioned, also has a parse
method which I suppose you, uhm, know of? I think the use of this
should be ... well, officially disencouraged in the context of this
quiz maybe. Since Ara has already mentioned this library, I hope this
isn't news for you.
 
J

James Gray

BTW, the ruby19 json library, which Ara mentioned, also has a parse
method which I suppose you, uhm, know of?

I suspect the number of us using 1.9 exclusively is still pretty
small, so I don't focus too much on it when writing quizzes. I knew
it had a JSON library though, yes.
I think the use of this should be ... well, officially disencouraged
in the context of this quiz maybe.

I agree. It's cheating too. :)

James Edward Gray II
 
A

-a

I suspect the number of us using 1.9 exclusively is still pretty
small, so I don't focus too much on it when writing quizzes. I knew
it had a JSON library though, yes.


I agree. It's cheating too. :)

yeah - obviously cheating. just to clarify though, for people who
might actually want to use json that this is not a 1.9 thing: it's on
rubyforge:

cfp2:~ > gem list --remote|grep json
fjson (0.1.2, 0.1.1, 0.1.0, 0.0.9, 0.0.8, 0.0.7, 0.0.6, 0.0.5, 0.0.4,
0.0.3, 0.0.2, 0.0.1)
json (1.1.2, 1.1.1, 1.1.0, 1.0.4, 1.0.3, 1.0.2, 1.0.1, 1.0.0, 0.4.3,
0.4.2, 0.4.1, 0.4.0)
json_pure (1.1.2, 1.1.1, 1.1.0, 1.0.4, 1.0.3, 1.0.2, 1.0.1, 1.0.0)
Orbjson (0.0.4, 0.0.3, 0.0.2, 0.0.1)
ruby-json (1.1.2, 1.1.1)

and activesupport also includes a parser

regards.
 
T

Trans

yeah - obviously cheating. just to clarify though, for people who
might actually want to use json that this is not a 1.9 thing: it's on
rubyforge:

cfp2:~ > gem list --remote|grep json
fjson (0.1.2, 0.1.1, 0.1.0, 0.0.9, 0.0.8, 0.0.7, 0.0.6, 0.0.5, 0.0.4,
0.0.3, 0.0.2, 0.0.1)
json (1.1.2, 1.1.1, 1.1.0, 1.0.4, 1.0.3, 1.0.2, 1.0.1, 1.0.0, 0.4.3,
0.4.2, 0.4.1, 0.4.0)
json_pure (1.1.2, 1.1.1, 1.1.0, 1.0.4, 1.0.3, 1.0.2, 1.0.1, 1.0.0)
Orbjson (0.0.4, 0.0.3, 0.0.2, 0.0.1)
ruby-json (1.1.2, 1.1.1)

and activesupport also includes a parser

as does blow.

T.
 
Ad

Advertisements

R

Robert Dober

I do not know why I have the impression that you want make it easier
for James to leave ;).

The idea giving some show on parsing ( never really done AFAIR in a
Ruby Quiz) is a nice one!!!

Now a translation that would eval is indeed an interesting idea
especially as I am almost sure that it would be parsing too.
AFAIK Json can be read by Javascript natively (surprisingly) what
about implementing Javascript in Ruby ;)

Cheers
Robert
 
S

steve

[Note: parts of this message were removed to make it a legal post.]

Hey guys

This is my first parser. I used Nathan Sobo's Treetop parsing library (
http://treetop.rubyforge.org/, gem install treetop):

http://pastie.caboo.se/146906

require 'treetop'


File.open("json.treetop", "w") {|f| f.write GRAMMAR }


Treetop.load "json"

parser = JsonParser.new


pp parser.parse(STDIN.read).value if $0 == __FILE__



BEGIN {

GRAMMAR = %q{

grammar Json
rule json
space json_value space { def value; json_value.value; end }
end


rule json_value
string / numeric / keyword / object / array
end



rule string
'"' chars:char* '"' {
def value
chars.elements.map {|e| e.value }.join

end
}
end

rule char
!'"' ('\\\\' ( ( [nbfrt"] / '\\\\' / '/' ) / 'u' hex hex hex hex )
/ !'\\\\' .) {

def value
if text_value[0..0] == '\\\\'
case c = text_value[1..1]
when /[nbfrt]/

{'n' => "\n", 'b' => "\b", 'f' => "\f", 'r' => "\r", 't' => "\t"}[c]
when 'u'

[text_value[2,4].to_i(16)].pack("L").gsub(/\0*$/,'')
else
c
end

else
text_value
end
end
}
end


rule hex
[0-9a-fA-F]
end



rule numeric
exp / float / integer
end

rule exp
(float / integer) ('e' / 'E') ('+' / '-')? integer { def value;
text_value.to_f; end }

end

rule float
integer '.' [0-9]+ { def value; text_value.to_f; end }
end


rule integer
'-'? ('0' / [1-9] [0-9]*) { def value; text_value.to_i; end }
end



rule keyword
('true' / 'false' / 'null') {
def value

{ 'true' => true, 'false' => false, 'null' => nil }[text_value]
end
}
end



rule object
'{' space pairs:pair* space '}' {
def value

pairs.elements.map {|p| p.value }.inject({}) {|h,p| h.merge p }
end
}
end


rule pair
space string space ':' space json_value space (',' &pair / !pair) {
def value
{ string.value => json_value.value }

end
}
end


rule array

'[' space array_values:array_value* space ']' {
def value
array_values.elements.map {|e| e.value }

end
}
end

rule array_value
space json_value space (',' &array_value / !array_value) {

def value
json_value.value
end
}
end



rule space
[ \t\r\n]*
end

end

}

}

- steve

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz
until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby Talk follow the discussion. Please reply to the original quiz
message,
if you can.


-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

There has been a lot of talk recently about parsing with Ruby. We're
seeing
some parser generator libraries pop up that make the task that much easier
and
they've been stirring up interest.

In honor of that, this week's Ruby Quiz is to write a parser for JSON.

JSON turns out to turns out to be a great little example for writing
parsers for
two reasons. First, it's pretty easy stuff. You can hand-roll a JSON
parser in
under 100 lines of Ruby. The second advantage is that the data format is
wonderfully documented:

http://json.org/

Since JSON is just a data format and Ruby supports all of the data types,
I vote
we just use Ruby itself as the abstract syntax tree produced by the parse.

Feel free to show off your favorite parser generator, if you don't want to
roll
your own. Anything goes.

Here are a few tests to get you started:

require "test/unit"

class TestJSONParser < Test::Unit::TestCase
def setup
@parser = JSONParser.new
end

def test_keyword_parsing
assert_equal(true, @parser.parse("true"))
assert_equal(false, @parser.parse("false"))
assert_equal(nil, @parser.parse("null"))
end

def test_number_parsing
assert_equal(42, @parser.parse("42"))
assert_equal(-13, @parser.parse("-13"))
assert_equal(3.1415, @parser.parse("3.1415"))
assert_equal(-0.01, @parser.parse("-0.01"))

assert_equal(0.2e1, @parser.parse("0.2e1"))
assert_equal(0.2e+1, @parser.parse("0.2e+1"))
assert_equal(0.2e-1, @parser.parse("0.2e-1"))
assert_equal(0.2E1, @parser.parse("0.2e1"))
end

def test_string_parsing
assert_equal(String.new, @parser.parse(%Q{""}))
assert_equal("JSON", @parser.parse(%Q{"JSON"}))

assert_equal( %Q{nested "quotes"},
@parser.parse('"nested \"quotes\""') )
assert_equal("\n", @parser.parse(%Q{"\\n"}))
assert_equal( "a",
@parser.parse(%Q{"\\u#{"%04X" % ?a}"}) )
end

def test_array_parsing
assert_equal(Array.new, @parser.parse(%Q{[]}))
assert_equal( ["JSON", 3.1415, true],
@parser.parse(%Q{["JSON", 3.1415, true]}) )
assert_equal([1, [2, [3]]], @parser.parse(%Q{[1, [2, [3]]]}))
end

def test_object_parsing
assert_equal(Hash.new, @parser.parse(%Q{{}}))
assert_equal( {"JSON" => 3.1415, "data" => true},
@parser.parse(%Q{{"JSON": 3.1415, "data": true}})
)
assert_equal( { "Array" => [1, 2, 3],
"Object" => {"nested" => "objects"} },
@parser.parse(<<-END_OBJECT) )
{"Array": [1, 2, 3], "Object": {"nested": "objects"}}
END_OBJECT
end

def test_parse_errors
assert_raise(RuntimeError) { @parser.parse("{") }
assert_raise(RuntimeError) { @parser.parse(%q{{"key": true
false}}) }

assert_raise(RuntimeError) { @parser.parse("[") }
assert_raise(RuntimeError) { @parser.parse("[1,,2]") }

assert_raise(RuntimeError) { @parser.parse(%Q{"}) }
assert_raise(RuntimeError) { @parser.parse(%Q{"\\i"}) }

assert_raise(RuntimeError) { @parser.parse("$1,000") }
assert_raise(RuntimeError) { @parser.parse("1_000") }
assert_raise(RuntimeError) { @parser.parse("1K") }

assert_raise(RuntimeError) { @parser.parse("unknown") }
end
end
 
Ad

Advertisements

J

Justin Ethier

[Note: parts of this message were removed to make it a legal post.]

Here is my solution. I do a first pass to tokenize the input and perform
basic syntax checks. Then the expression is fully converted into ruby syntax
and eval is used to load it into Ruby. It passes each of the test cases,
although some improvements could still be made.

class JSONParser
# Parse a given JSON expression
def parse(expr)
# Tokenize the input
tokens = lex(expr)

# Load the expression into ruby
# Takes advantage of the fact ruby syntax is so close to that of JSON.
# However, it would be nice to have a safe_eval to prevent against
potential injection attacks
begin
eval(ruby_convert(tokens))
rescue SyntaxError, NameError
raise RuntimeError
end
end

# Converts tokens into a single ruby expression
def ruby_convert(tokens)
expr = ""
for token in tokens
token = "=>" if token == ":" # Ruby hash syntax
token = "nil" if token == "null"
expr += token
end
expr
end

# Parses the input expression into a series of tokens
# Performs some limited forms of conversion where necessary
def lex(expr)
tokens = []
i = -1
while i < expr.size - 1
tok ||= ""
i += 1

case expr.chr
when '[', ']', '{', '}', ':', ','
tokens << tok if tok.size > 0
tokens << expr.chr
tok = ""
# String processing
when '"'
raise "Unexpected quote" if tok.size > 0
len = 1
escaped = false
while (len + i) < expr.size
break if expr[len + i].chr == '"' and not escaped
if escaped
case expr[len + i].chr
when '"', '/', '\\', 'b', 'f', 'n', 'r', 't', 'u'
else
raise "Unable to escape #{expr[len + i].chr}"
end
end
escaped = expr[len + i].chr == "\\"
len += 1
end
raise "No matching endquote for string" if (len + i) > expr.size
tokens << convert_unicode(expr.slice(i, len+1))
i += len
# Number processing
when '-', /[0-9]/
len = 0
while (len + i) < expr.size and /[0-9eE+-.]/.match(expr[len +
i].chr)!= nil
len += 1
end
num = expr.slice(i, len)

# Verify syntax of the number using the JSON state machine
raise "Invalid number #{num}" if
/[-]?([1-9]|(0\.))[0-9]*[eE]?[+-]?[0-9]*/.match(num) == nil

tokens << num
i += len - 1
# Skip whitespace
when ' ', '\t'
else
tok << expr.chr
end
end
tokens << tok if tok.size > 0
tokens
end

# Convert unicode characters from hex (currently only handles ASCII set)
def convert_unicode(str)
while true
u_idx = str.index(/\\u[0-9a-fA-F]{4}/)
break if u_idx == nil

u_str = str.slice(u_idx, 6)
str.sub!(u_str, u_str[2..5].hex.chr)
end
str
end
end


Thanks,

Justin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top