Graph library recommendations for large graphs

Discussion in 'Python' started by VanL, Aug 24, 2009.

  1. VanL

    VanL Guest

    I am working on a project that will require building and querying large
    graph objects (initially 8M nodes, 30-40M edges; eventually 40M nodes,
    100M edges). NetworkX seems to be the most popular, but I am concerned
    that a dict representation for nodes would use too much memory -- my
    initial tests suggest that a graph with 3M nodes and 12M edges creates
    substantial memory pressure on my machine.

    Can anybody who has worked with large graphs before give a recommendation?

    Thanks,

    Van
    VanL, Aug 24, 2009
    #1
    1. Advertising

  2. VanL schrieb:
    > I am working on a project that will require building and querying large
    > graph objects (initially 8M nodes, 30-40M edges; eventually 40M nodes,
    > 100M edges). NetworkX seems to be the most popular, but I am concerned
    > that a dict representation for nodes would use too much memory -- my
    > initial tests suggest that a graph with 3M nodes and 12M edges creates
    > substantial memory pressure on my machine.
    >
    > Can anybody who has worked with large graphs before give a recommendation?


    My initial tests show otherwise. The below test-script creates 3 million
    nodes with 12 million adjacencies, on my 2GB Notebook.

    The theoretical limit for this (if we assume pointer-based
    adjacency-references which makes sense if we have sparse graphs as your
    numbers indicate) is (32 bits assumed):

    - 8 bytes per node (4 byte pointer to adjacency list, 4 byte int for
    counting the number of adjacencies in that list)
    - 4 bytes per adjacency

    This is 60.000.000 for your example - roughly 60MB. On my machine, the
    process has 320.000.000MB - (roughly) a factor five. Given the much
    richer properties a Python-object (and python-lists) have thas is pretty
    good I'd say.

    So for your eventual size of 40M nodes, 100M edges, we have a
    theoretical amount of 560MB, times 5 makes 2.5 GB. Not exactly a low
    memory profile, but manageable on modern hardware.

    I don't know anything about NetworkX - it still might be the better
    solution, given the underlying C-based algorithms. But if all you need
    is to represent a graph of that size, it appears to be working.


    ---- test.py ----

    import random
    import gc
    import time

    class Node(object):

    __slots__ = ["adjacencies", "value", "id"]

    def __init__(self, id):
    id = id
    value = random.random()
    self.adjacencies = []


    nodes = []

    gc.disable()
    nc = 3000000

    for i in xrange(nc):
    nodes.append(Node(i))
    if (i % 1000) == 0:
    print i

    for i in xrange(12000000):
    a = random.randint(0, nc - 1)
    b = random.randint(0, nc - 1)
    while a == b:
    b = random.randint(0, nc)
    nodes[a].adjacencies.append(nodes)
    if (i % 1000) == 0:
    print "e", i


    gc.enable()
    while True:
    time.sleep(1)
    traversed = set()
    def depth_search(node, depth=0):
    traversed.add(node)
    if depth == 4:
    return
    for child in node.adjacencies:
    if child not in traversed:
    depth_search(child, depth+1)

    depth_search(nodes[random.randint(0, nc - 1)])

    ------


    Diez
    Diez B. Roggisch, Aug 24, 2009
    #2
    1. Advertising

  3. VanL

    Bearophile Guest

    Bearophile, Aug 24, 2009
    #3
  4. On Aug 24, 5:37 pm, VanL <> wrote:
    >
    > Can anybody who has worked with large graphs before give a recommendation?
    >


    when using large graphs another limitation may come from the various
    graph algorithm run times. Most likely you will need to squeeze out as
    much as possible and a python implementation has a lot of overhead.

    I've used the LEDA graph library with great success. This is a C++
    library with substantial syntax sugar that looks a bit like python
    (and I made some python bindings for it via SWIG and thus got the best
    of both worlds, lost the code I'm afraid).

    http://www.algorithmic-solutions.info/leda_guide/Graphs.html

    i.
    Istvan Albert, Aug 25, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jef Driesen
    Replies:
    3
    Views:
    2,529
    mlimber
    Jan 24, 2006
  2. Wolodja Wentland
    Replies:
    8
    Views:
    1,221
    anand jeyahar
    Dec 11, 2009
  3. Tiago de Paula Peixoto
    Replies:
    0
    Views:
    412
    Tiago de Paula Peixoto
    Dec 12, 2009
  4. Almoni
    Replies:
    0
    Views:
    3,084
    Almoni
    Jan 17, 2010
  5. Emilio Mayorga
    Replies:
    6
    Views:
    316
    Martien Verbruggen
    Oct 8, 2003
Loading...

Share This Page