Byte Offsets of Tokens, Ngrams and Sentences?

Discussion in 'Python' started by Muhammad Adeel, Aug 6, 2010.

  1. Hi,

    Does any one know how to tokenize a string in python that returns the
    byte offsets and tokens? Moreover, the sentence splitter that returns
    the sentences and byte offsets? Finally n-grams returned with byte
    offsets.

    Input:
    This is a string.

    Output:
    This 0
    is 5
    a 8
    string. 10


    thanks
    Muhammad Adeel, Aug 6, 2010
    #1
    1. Advertising

  2. En Fri, 06 Aug 2010 06:07:32 -0300, Muhammad Adeel <>
    escribió:

    > Does any one know how to tokenize a string in python that returns the
    > byte offsets and tokens? Moreover, the sentence splitter that returns
    > the sentences and byte offsets? Finally n-grams returned with byte
    > offsets.
    >
    > Input:
    > This is a string.
    >
    > Output:
    > This 0
    > is 5
    > a 8
    > string. 10


    Like this?

    py> import re
    py> s = "This is a string."
    py> for g in re.finditer("\S+", s):
    .... print g.group(), g.start()
    ....
    This 0
    is 5
    a 8
    string. 10

    --
    Gabriel Genellina
    Gabriel Genellina, Aug 6, 2010
    #2
    1. Advertising

  3. On Aug 6, 10:49 am, "Gabriel Genellina" <>
    wrote:
    > En Fri, 06 Aug 2010 06:07:32 -0300, Muhammad Adeel <>  
    > escribió:
    >
    > > Does any one know how to tokenize a string in python that returns the
    > > byte offsets and tokens? Moreover, the sentence splitter that returns
    > > the sentences and byte offsets? Finally n-grams returned with byte
    > > offsets.

    >
    > > Input:
    > > This is a string.

    >
    > > Output:
    > > This  0
    > > is      5
    > > a       8
    > > string.   10

    >
    > Like this?
    >
    > py> import re
    > py> s = "This is a string."
    > py> for g in re.finditer("\S+", s):
    > ...   print g.group(), g.start()
    > ...
    > This 0
    > is 5
    > a 8
    > string. 10
    >
    > --
    > Gabriel Genellina


    Hi,

    Thanks. Can you please tell me how to do for n-grams and sentences as
    well?
    Muhammad Adeel, Aug 6, 2010
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tony
    Replies:
    4
    Views:
    2,124
    Andy De Petter
    Nov 27, 2003
  2. Phillip Farber
    Replies:
    0
    Views:
    416
    Phillip Farber
    Aug 20, 2003
  3. Minkoo Seo
    Replies:
    2
    Views:
    89
    Sylvain Joyeux
    Apr 2, 2006
  4. Xavier Noria

    Time.local and offsets

    Xavier Noria, Aug 25, 2010, in forum: Ruby
    Replies:
    8
    Views:
    194
    Rick DeNatale
    Aug 31, 2010
  5. Robert Dodier
    Replies:
    2
    Views:
    146
    Tad McClellan
    Jul 9, 2006
Loading...

Share This Page