Trie data structure

J

Justin To

I'm trying to implement a trie data structure for my parsing program
that parses numbers....I'm lost as to what the algorithm would look
like. Also, I don't fully understand the wikipedia definition at
http://en.wikipedia.org/wiki/Trie when it says "...no node in the tree
stores the key associated with that node; instead, its position in the
tree shows what key it is associated with..."

Any help is appreciated!
 
N

nicholasmabry

I'm trying to implement a trie data structure for my parsing program
that parses numbers....I'm lost as to what the algorithm would look
like. Also, I don't fully understand the wikipedia definition athttp://en.wikipedia.org/wiki/Triewhen it says "...no node in the tree
stores the key associated with that node; instead, its position in the
tree shows what key it is associated with..."

Any help is appreciated!

The basic idea is that each node contains only a partial key, the
final piece of its full key. So if the node you're adding should have
the full key "hi":
- It will contain the partial key "i".
- Its parent will have the partial key "h".
- Its grandparent will have an empty partial key - the root node.

As the lookup algorithm example on the wiki page describes, looking up
a value by key is a matter of starting at the root node and following
each path that leads you to the next item in the full key. This
differs from a binary tree in that the tree structure is not only a
means to locate the key, but a reflection of the key itself.

Have fun!
 
D

Dave Bass

Justin said:
Also, I don't fully understand the wikipedia definition at
http://en.wikipedia.org/wiki/Trie when it says "...no node in the tree
stores the key associated with that node; instead, its position in the
tree shows what key it is associated with..."

This web page has an example of data represented as a trie:

http://tinyurl.com/56t7kh

The example is a set of words with each node corresponding to a
character.

There are various ways of implementing these sorts of data structures,
but hashes provide a powerful basis.
 
J

Justin To

Most of the articles I read say that instead of storing a value at a
node, it stores a reference. How would one accomplish this in Ruby?
Also, I'm trying to make a trie to store numbers from 100,000 to
999,999. Basically, what would the trie look like?

Thanks for the help!
 
M

Martin DeMello

Most of the articles I read say that instead of storing a value at a
node, it stores a reference. How would one accomplish this in Ruby?
Also, I'm trying to make a trie to store numbers from 100,000 to
999,999. Basically, what would the trie look like?

Take a look at http://rubyquiz.com/quiz103.html

martin
 
J

Justin To

I can't seem to grasp the algorithm for a trie. Can someone please help
me... I need the trie to store 6-digit numbers. I've read all the above
articles, but I still have no clue of how I should begin.

Thanks for the help!
 
M

Martin DeMello

I can't seem to grasp the algorithm for a trie. Can someone please help
me... I need the trie to store 6-digit numbers. I've read all the above
articles, but I still have no clue of how I should begin.

here's a quick example of a trie storing the three digit numbers 123,
145, 207, 451, 455. we take advantage of the fact that all numbers are
known to be the same length, otherwise you also need to store a
"number_ends_here?" flag on each node

{ :root =>
{ 1 =>
{ 2 => { 3 => true } },
{ 4 => { 5 => true } }
},
{ 2 =>
{ 0 => { 7 => true } }
},
{ 4 =>
{ 5 =>
{ 1 => true, 5 => true } }
}
}

you traverse it thus (untested code, but it gives the basic idea):

node = trie[:root]
num_to_check = "123"
num_to_check.split(//).each {|digit|
node = node[digit]
return false unless node
}

return true

martin
 
J

Justin To

class Trie
attr_reader :value, :children
def initialize(value=nil)
@value = value # The corresponding value
@children = [] # Store hashes inside of the
hash
end

def <<(value)
sub_trie = Trie.new(value)
@children << sub_trie
return sub_trie
# End of def <<(value)
end

# Return true if value exists or nil if value D.N.E.
def child_value?(value)
if @children.empty? # If @children is empty, then there
are no children
return 'empty'
else
# If one of the children contains the value then return true
@children.each do |child|
if(child.value==value)
return true
else
return nil
end
end
end
# End of def child_value?(value)
end

# Return the child that added the value if the value D.N.E. else
return nil, the value already exists
# DO NOT ALTER OR CALL OUTSIDE OF CLASS--belongs to def
add_number(value)
def add_digit(digit)
if(!child_value?(digit))
child = self<<(digit)
puts "#{digit} added"
return child
else
puts "#{digit}: already exists in the child."
return nil
end
end

def add_number(number)
current_node = self
number.to_s.each_byte do |byte|
puts "add_number #{byte.chr.to_i}"
puts "Nodeclass: " + current_node.class.to_s
current_node = current_node.add_digit(byte.chr.to_i)
end

end

# End of class Trie
end


t = Trie.new
t << 3

t.add_number(234)


Output:

HashTrie3.rb:50:in `add_number': undefined method `add_digit' for
nil:NilClass (NoMethodError)
from HashTrie3.rb:47:in `each_byte'
from HashTrie3.rb:47:in `add_number'
from HashTrie3.rb:62
add_number 2
Nodeclass: Trie
2 added
add_number 3
Nodeclass: Trie
3: already exists in the child. # ?
add_number 4 # ?
Nodeclass: NilClass # ?

Somewhere I'm not saving the current node correctly so when it adds 3,
it's adding it into the same array that the first 3 (t<<3) and the 2
[t.add_number(234)] is in. But then again, I could have messed up
somewhere else. Maybe I'm passing around copies of the current node and
not the ACTUAL one.

Any help is much appreciated!!!!!!!!!!!!!!!!!!!!

Justin
 
J

Justin To

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end
#--------------------------------------------------------------------------------------

def child_value?(value, node)

if(node.children.empty?)
return 'empty' # Return: 'empty', children,
'D.N.E.'
else
i=0; while(i<node.children.size)
if(node.children.value==value)
puts "#{value} exists in #{node}."
return node.children
else if(i==node.children.size-1)
return 'D.N.E.'
end
end
i+=1
end
end


end


def add_digits(digitArray)

node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node) # Scan the trie for
the digit

# Empty case
if(scan=='empty')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete(digitArray[0])

else if(scan=='D.N.E.')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete(digitArray[0])
else if(scan.is_a?(Trie))
if(digitArray.size==1)
scan.number_exists = true # ??
end
node = scan
digitArray.delete(digitArray[0])
else
puts "**Scan error**"
digitArray.delete(digitArray[0])
end
end
end

end
#end function
end


def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts "\'#{number}\' not a valid *Fixnum*"
end
end

# End of class Trie
end


t = Trie.new

t.add_number(434)

#984343525
# looking for a (4), (3 and 5)
# 9843--52-
puts "\n"
puts t.children[0].children[0].children.size

My program can add most numbers fine... but if it comes to something
like 434 or 4343 or I suppose any similar patterns, it doesn't work.
I've tried it with only 1 number, that way all you have to do is look
through def add_digits(digitArray) at the section for the Empty Case.

Any help is much appreciated!

Thanks!
 
J

Justin To

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end
#--------------------------------------------------------------------------------------

def child_value?(value, node)

if(node.children.empty?)
return 'empty' # Return: 'empty', children,
'D.N.E.'
else
i=0; while(i<node.children.size)
if(node.children.value==value)
return node.children # Something wrong here ????
else if(i==node.children.size-1)
return 'D.N.E.'
end
end
i+=1
end
end

end


def add_digits(digitArray)

node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node) # Scan the trie for
the digit

# Empty case
if(scan=='empty')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete(digitArray[0])

end

end
#end function
end


#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts "\'#{number}\' not a valid *Fixnum*"
end
end

# End of class Trie
end


t = Trie.new

t.add_number(434)

#984343525
# looking for a (4), (3 and 5)
# 9843--52-
puts "\n"

# Looking for size = 1
puts t.children[0].children[0].children.size


Actually, you can just look at this portion of code
 
A

Adam Shelly

There are a couple of issues with this code, here are some hints.
# Return true if value exists or nil if value D.N.E.
def child_value?(value)
if @children.empty? # If @children is empty, then there are no children
return 'empty'

'empty' is not nil - it will act like a true, which is the opposite of
what the comments say this function should do.
else
# If one of the children contains the value then return true
@children.each do |child|
if(child.value==value)
return true
else
return nil
end
end

this code is only testing the value of the first child: it has a
return statement in both branches of the 'if', so it will exit after
the first test.
end
# End of def child_value?(value)
end

# Return the child that added the value if the value D.N.E. else
return nil, the value already exists
# DO NOT ALTER OR CALL OUTSIDE OF CLASS--belongs to def add_number(value)
you can use the 'private' keyword here, which enforces 'do not call
outside of class'.
def add_digit(digit)
if(!child_value?(digit))
child = self<<(digit)
puts "#{digit} added"
return child
else
puts "#{digit}: already exists in the child."
return nil
end
end

def add_number(number)
current_node = self
number.to_s.each_byte do |byte|
puts "add_number #{byte.chr.to_i}"
puts "Nodeclass: " + current_node.class.to_s
current_node = current_node.add_digit(byte.chr.to_i)
end
as your comments say, If the digit already exists in the children,
add_digit will return nil. In that case, you will set current_node to
nil here. You probably want to set it to something else.
end

# End of class Trie
end

hope this helps,
-Adam
 
J

Justin To

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end
#--------------------------------------------------------------------------------------

def child_value?(value, node)

if(node.children.empty?)
return 'empty' # Return: 'empty', children,
'D.N.E.'
else
i=0; while(i<node.children.size)
if(node.children.value==value)
return node.children # Something wrong here ????
else if(i==node.children.size-1)
return 'D.N.E.'
end
end
i+=1
end
end

end


def add_digits(digitArray)
node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node) # Scan the trie for
the digit

# Empty case
if(scan=='empty')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete_at(0)

# D.N.E. case
else if(scan=='D.N.E.')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete_at(0)

# Value exists case
else if(scan.is_a?(Trie))
puts "ok"
if(digitArray.size==1)
scan.number_exists = true # ??
end
node = scan
digitArray.delete_at(0)

# If the number already exists ---- ???
else
digitArray.delete_at(0)
end
end
end

end
#end function
end


#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts "\'#{number}\' not a valid *Fixnum*"
end
end

# End of class Trie
end


t = Trie.new

t.add_number(2223)
t.add_number(2333)

puts "\n"


puts t.children[0].children[0].children.size



Hey Adam, thanks for the tips. I redid a few things, thought it was
working, but turns out it's still buggy. If you can point anything out
to me, that'd be great! Thanks again!
 
J

Justin To

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end

def each
yield value
@children.each do |child_node|
child_node.each { |e| yield e }
end
end

def output
each { |x| puts x }
end
#--------------------------------------------------------------------------------------

def child_value?(value, node)
if(node.children.empty?)
return 'empty' # Return: 'empty', children,
'D.N.E.'
else
i=0; while(i<node.children.size)
if(node.children.value==value)
return node.children # Something wrong here ????
else if(i==node.children.size-1)
return 'D.N.E.'
end
end
i+=1
end
end

end


def add_digits(digitArray)
node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node) # Scan the trie for
the digit

# Empty case
if(scan=='empty')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete_at(0)

# D.N.E. case
else if(scan=='D.N.E.')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete_at(0)

# Value exists case
else if(scan.is_a?(Trie))
if(digitArray.size==1)
scan.number_exists = true # ??
end
node = scan
digitArray.delete_at(0)

# If the number already exists ---- ???
else
#set node = child node
puts "**Scan error**"
digitArray.delete_at(0)
end
end
end

end
#end function
end


#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts "\'#{number}\' not a valid *Fixnum*"
end
end

# End of class Trie
end


t = Trie.new

File.open('output.txt', 'r').each do |x|

t.add_number(x.chomp.to_i)

end

t.output



I think I've finally got it to work. Now, I'm going to try to implement
a method #output to output all of the numbers... now I'm in the crap
T_T. Firstly, the #output I have above displays all of the nodes, but
now I need to say:
(*) Traverse every node and
when you reach a node whose
number_exists = true,
make that list of letters
into a word and store it in
an array.

Ex.

nil
/
1
/
2 => number_exists = true (12)
/
3 => number_exists = true (123)

Final array: storage[12, 123]

Thanks for any help!

Justin
 
J

Justin To

class Trie
attr_reader :value, :children
attr_accessor :number_exists
def initialize(value=nil, number_exists=false)
@value = value
@children = []
@number_exists = number_exists # BOOL
end

def <<(value)
sub_trie = Trie.new(value)
children << sub_trie
return sub_trie
end

def each
info = [value,number_exists, children]
yield(info)
@children.each do |child_node|
child_node.each { |e| yield e }
end
end

def output
numberCollector = []
tempString= ''
each do |x|
tempString.concat(x[0].to_s)
if(x[1])
numberCollector.push(tempString)
if(x[2].size!=0)
tempString=''
else
x[2].each
end
end
end

puts numberCollector
end
#--------------------------------------------------------------------------------------

def child_value?(value, node)
if(node.children.empty?)
return 'empty' # Return: 'empty', children,
'D.N.E.'
else
i=0; while(i<node.children.size)
if(node.children.value==value)
return node.children # Something wrong here ????
else if(i==node.children.size-1)
return 'D.N.E.'
end
end
i+=1
end
end

end


def add_digits(digitArray)
node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node) # Scan the trie for
the digit

# Empty case
if(scan=='empty')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete_at(0)

# D.N.E. case
else if(scan=='D.N.E.')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete_at(0)

# Value exists case
else if(scan.is_a?(Trie))
if(digitArray.size==1)
scan.number_exists = true # ??
end
node = scan
digitArray.delete_at(0)

# If the number already exists ---- ???
else
#set node = child node
puts "**Scan error**"
digitArray.delete_at(0)
end
end
end

end
#end function
end


#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts "\'#{number}\' not a valid *Fixnum*"
end
end

# End of class Trie
end


t = Trie.new

File.open('output.txt', 'r').each do |x|

t.add_number(x.chomp.to_i)

end

t.output



I can display numbers that don't have similar prefixes.. e.g. 123, 234,
345; but I can't store these examples: 123, 145, 146

My algorithm should be:
1) Check nodes until node.number_exists = true
- True: store the number up until this point:
node
- If there are more nodes after node, then
keep going and start from (1)

Any help is appreciated, thanks!!!
 
J

Justin To

def each
info = [value,number_exists, children]
yield(info)
@children.each do |child_node|
child_node.each { |e| yield e }
end
end

def output
numberCollector = []
tempString= ''

each do |x|
tempString.concat(x[0].to_s)
if(x[1])
numberCollector.push(tempString)
tempString=tempString.chop
end


end

puts numberCollector
end

Only works for numbers that share (1) common node: e.g. 22, 23, 24, 25

What can I do to output numbers that share (x) common nodes?

Thanks!
 
J

Justin To

require 'ruby-prof'

class Trie
attr_reader :value, :children
attr_accessor :number_exists, :prefix
def initialize(value=nil)
@value = value
@children = []
@number_exists = false
@prefix=''
end

def <<(value)
sub_trie = Trie.new(value)
sub_trie.prefix = prefix+value.to_s
children << sub_trie
return sub_trie
end

def each
info = [value, children, number_exists, prefix]

yield(info)
@children.each do |child_node|
child_node.each { |e| yield e }
end

end

def output(opt='all')

if(opt.is_a?(String))
if(opt.downcase=='all')
each do |x|

if(x[3]!='')
if(x[2])
puts x[3]
end
end
end

else
puts "Error: '#{opt}' not a valid selection.\n \
Valid selections\n \
- All: outputs every # in the Trie\n \
- #: (e.g. 1, 123, 43...) checks if the # exists in the Trie"
end
else if(opt.is_a?(Fixnum))

each do |x|
if(x[3].to_i==opt)
if(x[2])
puts x[3].to_s + ' exists.'
return true
end
end
end

puts "#{opt} not found."

end
end

end


#--------------------------------------------------------------------------------------

def child_value?(value, node)
if(node.children.empty?)
return 'empty' # Return: 'empty', children,
'D.N.E.'
else
i=0; while(i<node.children.size)
if(node.children.value==value)
return node.children # Something wrong here ????
else if(i==node.children.size-1)
return 'D.N.E.'
end
end
i+=1
end
end
end


def add_digits(digitArray)

node = self
while(digitArray[0])
scan = child_value?(digitArray[0], node) # Scan the trie for
the digit

# Empty case
if(scan=='empty')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete_at(0)

# D.N.E. case
else if(scan=='D.N.E.')
node = node<<(digitArray[0])
if(digitArray.size==1)
node.number_exists = true
end
digitArray.delete_at(0)

# Value exists case
else if(scan.is_a?(Trie))
if(digitArray.size==1)
scan.number_exists = true # ?? FOR OUTPUT.. check this
area
end

node = scan
digitArray.delete_at(0)
if(digitArray.size==0)
return true
end

else
puts "**Scan error**"
digitArray.delete_at(0)
end
end
end

end
#end function
end

#--------------------------------------------------------------------------------------
def add_number(number)
if(number.is_a?(Fixnum))
digitArray = []
number.to_s.each_byte { |byte| digitArray.push(byte.chr.to_i) }
add_digits(digitArray)
else
puts "\'#{number}\' not a valid *Fixnum*"
end
end



# End of class Trie
end


t = Trie.new(nil)

open('output.txt', 'w') do |f|

i=0; while i<100000

#-----6 digit number-----
min = 100000
max = 999999
f << min + rand(max-min+1) << "\n"

i+=1
end

f.flush
end

RubyProf.start
File.open('output.txt', 'r').each do |x|

t.add_number(x.chomp.to_i)

end


result = RubyProf.stop
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT, 0)





*Trie data structure for random-generated 6-digit numbers
I think I've finally managed to put together a newbie Trie data
structure. I'm kind of lost, though, when it comes to testing it. It's
suppose to be a very effective data structure, but when I test it out,
it doesn't seem very good. Compared to a hash and an array, it's much
slower at adding new numbers and only sometimes is it faster at
retrieving a specific number.


If anybody can go through my newbie code and point out areas that are
ineffective or inefficient, that'd be terrific. I'm positive that my
implementation is very poor, and I need some feedback from more
experienced people about how to fix it.

Thanks a lot!

Justin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top