Anyone know of a MICR parser algorithm written in Python?

Discussion in 'Python' started by mkppk, Mar 24, 2007.

  1. mkppk

    mkppk Guest

    MICR = The line of digits printed using magnetic ink at the bottom of
    a check.

    Does anyone know of a Python function that has been written to parse a
    line of MICR data?
    Or, some financial package that may contain such a thing?
    Or, in general, where I should be looking when looking for a piece of
    Python code that may have already been written by someone?

    I'm working on a project that involves a check scanner the produces
    the raw MICR line as text.

    Now, that raw MICR needs to be parsed for the various pieces of info.
    The problem with MICR is that there is no standard layout. There are
    some general rules for item placement, but beyond that it is up to the
    individual banks to define how they choose to position the
    information.

    I did find an old C program written by someone at IBM... But I've read
    it and it is Not code that would nicely convert to Python (maybe its
    all the Python I'm used to, be it seems very poorly written).

    Here is the link to that C code: ftp://ftp.software.ibm.com/software/retail/poseng/4610/4610micr.zip

    I've even tried using boost to generate a Python module, but that
    didn't go well, and in the end is not going to be a solution for me
    anyway.. really need access to the Python source.

    Any help at all would be appreciated,

    -mkp
    mkppk, Mar 24, 2007
    #1
    1. Advertising

  2. mkppk

    Paul McGuire Guest

    On Mar 24, 2:05 pm, "mkppk" <> wrote:
    > MICR = The line of digits printed using magnetic ink at the bottom of
    > a check.
    >
    > Does anyone know of a Python function that has been written to parse a
    > line of MICR data?
    > Or, some financial package that may contain such a thing?
    > Or, in general, where I should be looking when looking for a piece of
    > Python code that may have already been written by someone?
    >
    > I'm working on a project that involves a check scanner the produces
    > the raw MICR line as text.
    >
    > Now, that raw MICR needs to be parsed for the various pieces of info.
    > The problem with MICR is that there is no standard layout. There are
    > some general rules for item placement, but beyond that it is up to the
    > individual banks to define how they choose to position the
    > information.
    >
    > I did find an old C program written by someone at IBM... But I've read
    > it and it is Not code that would nicely convert to Python (maybe its
    > all the Python I'm used to, be it seems very poorly written).
    >
    > Here is the link to that C code:ftp://ftp.software.ibm.com/software/retail/poseng/4610/4610micr.zip
    >
    > I've even tried using boost to generate a Python module, but that
    > didn't go well, and in the end is not going to be a solution for me
    > anyway.. really need access to the Python source.
    >
    > Any help at all would be appreciated,
    >
    > -mkp


    Is there a spec somewhere for this data? Googling for "MICR data
    format specification" and similar gives links to specifications for
    the MICR character *fonts*, but not for the data content.

    And you are right, reverse-engineering this code is more than a 10-
    minute exercise. (However, the zip file *does* include a nice set of
    test cases, which might be better than the C code as a starting point
    for new code.)

    -- Paul
    Paul McGuire, Mar 24, 2007
    #2
    1. Advertising

  3. mkppk

    mkppk Guest

    On Mar 24, 4:55 pm, "Paul McGuire" <> wrote:
    > On Mar 24, 2:05 pm, "mkppk" <> wrote:
    >
    >
    >
    > > MICR = The line of digits printed using magnetic ink at the bottom of
    > > a check.

    >
    > > Does anyone know of a Python function that has been written to parse a
    > > line of MICR data?
    > > Or, some financial package that may contain such a thing?
    > > Or, in general, where I should be looking when looking for a piece of
    > > Python code that may have already been written by someone?

    >
    > > I'm working on a project that involves a check scanner the produces
    > > the raw MICR line as text.

    >
    > > Now, that raw MICR needs to be parsed for the various pieces of info.
    > > The problem with MICR is that there is no standard layout. There are
    > > some general rules for item placement, but beyond that it is up to the
    > > individual banks to define how they choose to position the
    > > information.

    >
    > > I did find an old C program written by someone at IBM... But I've read
    > > it and it is Not code that would nicely convert to Python (maybe its
    > > all the Python I'm used to, be it seems very poorly written).

    >
    > > Here is the link to that C code:ftp://ftp.software.ibm.com/software/retail/poseng/4610/4610micr.zip

    >
    > > I've even tried using boost to generate a Python module, but that
    > > didn't go well, and in the end is not going to be a solution for me
    > > anyway.. really need access to the Python source.

    >
    > > Any help at all would be appreciated,

    >
    > > -mkp

    >
    > Is there a spec somewhere for this data? Googling for "MICR data
    > format specification" and similar gives links to specifications for
    > the MICR character *fonts*, but not for the data content.
    >
    > And you are right, reverse-engineering this code is more than a 10-
    > minute exercise. (However, the zip file *does* include a nice set of
    > test cases, which might be better than the C code as a starting point
    > for new code.)
    >
    > -- Paul



    Well, the problem is that the "specification" is that "there is no
    specification", thats just the way the MICR data line has evolved in
    the banking industry unfortunately for us developers.. That being
    said, there are obviusly enough banking companies out that with enough
    example data to have intelligent parsers that handle all the
    variations. And the C program appears to have all that built into it.

    Its just that I would rather not reinvent the wheel (or read old C
    code)..

    So, the search continues..
    mkppk, Mar 24, 2007
    #3
  4. mkppk

    Paul McGuire Guest

    On Mar 24, 6:52 pm, "mkppk" <> wrote:
    >
    > Its just that I would rather not reinvent the wheel (or read old C
    > code)..
    >

    Wouldn't we all!

    Here is the basic structure of a pyparsing solution. The parsing part
    isn't so bad - the real problem is the awful ParseONUS routine in C.
    Plus things are awkward since the C program parses right-to-left and
    then reverses all of the found fields, and the parser I wrote works
    left-to-right. Still, this grammar does most of the job. I've left
    out my port of ParseONUS since it is *so* ugly, and not really part of
    the pyparsing example.

    -- Paul

    from pyparsing import *

    # define values for optional fields
    NoAmountGiven = ""
    NoEPCGiven = ""
    NoAuxOnusGiven = ""

    # define delimiters
    DOLLAR = Suppress("$")
    T_ = Suppress("T")
    A_ = Suppress("A")

    # field definitions
    amt = DOLLAR + Word(nums,exact=10) + DOLLAR
    onus = Word("0123456789A- ")
    transit = T_ + Word("0123456789-") + T_
    epc = oneOf( list(nums) )
    aux_onus = A_ + Word("0123456789- ") + A_

    # validation parse action
    def validateTransitNumber(t):
    transit = t[0]
    flds = transit.split("-")
    if len(flds) > 2:
    raise ParseException(0, "too many dashes in transit number",
    0)
    if len(flds) == 2:
    if len(flds[0]) not in (3,4):
    raise ParseException(0, "invalid dash position in transit
    number", 0)
    else:
    # compute checksum
    ti = map(int,transit)
    ti.reverse() # original algorithm worked with reversed data
    cksum = 3*(ti[8]+ti[5]+ti[2]) + 7*(ti[7]+ti[4]+ti[1]) +
    ti[6]+ti[3]+ti[0]
    if cksum%10 != 0:
    raise ParseException(0, "transit number failed checksum",
    0)
    return transit

    # define overall MICR format, with results names
    micrdata =
    Optional(aux_onus,default=NoAuxOnusGiven).setResultsName("aux_onus") +
    \
    Optional(epc,default=NoEPCGiven).setResultsName("epc") +\

    transit.setParseAction(validateTransitNumber).setResultsName("transit")
    + \
    onus.setResultsName("onus") + \
    Optional(amt,default=NoAmountGiven).setResultsName("amt")
    + \
    stringEnd

    import re

    def parseONUS(tokens):
    tokens["csn"] = ""
    tokens["tpc"] = ""
    tokens["account"] = ""
    tokens["amt"] = tokens["amt"][0]
    onus = tokens.onus
    # remainder omitted out of respect for newsreaders...
    # suffice to say that unspeakable acts are performed on
    # onus and aux_onus fields to extract account and
    # check numbers

    micrdata.setParseAction(parseONUS)

    testdata = file("checks.csv").readlines()[1:]
    tests = [(flds[1],flds) for flds in map(lambda
    l:l.split(","),testdata)]
    def verifyResults(res,csv):
    def match(x,y):
    print (x==y and "_" or "X"),x,"=",y


    Ex,MICR,Bank,Stat,Amt,AS,TPC,TS,CSN,CS,ACCT,AS,EPC,ES,ONUS,OS,AUX,AS,Tran,TS
    = csv
    match(res.amt,Amt)
    match(res.account,ACCT)
    match(res.csn,CSN)
    match(res.onus,ONUS)
    match(res.tpc,TPC)
    match(res.epc,EPC)
    match(res.transit,Tran)

    for t,data in tests:
    print t
    try:
    res = micrdata.parseString(t)
    print res.dump()
    if not(data[0] == "No"):
    print "Passed expression that should have failed"
    verifyResults(res,data)
    except ParseException,pe:
    print "<parse failed> %s" % pe.msg
    if not(data[0] == "Yes"):
    print "Failed expression that should have passed"
    print
    Paul McGuire, Mar 25, 2007
    #4
  5. mkppk

    mkppk Guest

    On Mar 25, 12:30 am, "Paul McGuire" <> wrote:
    > On Mar 24, 6:52 pm, "mkppk" <> wrote:
    >
    > > Its just that I would rather not reinvent the wheel (or read old C
    > > code)..

    >
    > Wouldn't we all!
    >
    > Here is the basic structure of a pyparsing solution. The parsing part
    > isn't so bad - the real problem is the awful ParseONUS routine in C.
    > Plus things are awkward since the C program parses right-to-left and
    > then reverses all of the found fields, and the parser I wrote works
    > left-to-right. Still, this grammar does most of the job. I've left
    > out my port of ParseONUS since it is *so* ugly, and not really part of
    > the pyparsing example.
    >
    > -- Paul
    >
    > from pyparsing import *
    >
    > # define values for optional fields
    > NoAmountGiven = ""
    > NoEPCGiven = ""
    > NoAuxOnusGiven = ""
    >
    > # define delimiters
    > DOLLAR = Suppress("$")
    > T_ = Suppress("T")
    > A_ = Suppress("A")
    >
    > # field definitions
    > amt = DOLLAR + Word(nums,exact=10) + DOLLAR
    > onus = Word("0123456789A- ")
    > transit = T_ + Word("0123456789-") + T_
    > epc = oneOf( list(nums) )
    > aux_onus = A_ + Word("0123456789- ") + A_
    >
    > # validation parse action
    > def validateTransitNumber(t):
    > transit = t[0]
    > flds = transit.split("-")
    > if len(flds) > 2:
    > raise ParseException(0, "too many dashes in transit number",
    > 0)
    > if len(flds) == 2:
    > if len(flds[0]) not in (3,4):
    > raise ParseException(0, "invalid dash position in transit
    > number", 0)
    > else:
    > # compute checksum
    > ti = map(int,transit)
    > ti.reverse() # original algorithm worked with reversed data
    > cksum = 3*(ti[8]+ti[5]+ti[2]) + 7*(ti[7]+ti[4]+ti[1]) +
    > ti[6]+ti[3]+ti[0]
    > if cksum%10 != 0:
    > raise ParseException(0, "transit number failed checksum",
    > 0)
    > return transit
    >
    > # define overallMICRformat, with results names
    > micrdata =
    > Optional(aux_onus,default=NoAuxOnusGiven).setResultsName("aux_onus") +
    > \
    > Optional(epc,default=NoEPCGiven).setResultsName("epc") +\
    >
    > transit.setParseAction(validateTransitNumber).setResultsName("transit")
    > + \
    > onus.setResultsName("onus") + \
    > Optional(amt,default=NoAmountGiven).setResultsName("amt")
    > + \
    > stringEnd
    >
    > import re
    >
    > def parseONUS(tokens):
    > tokens["csn"] = ""
    > tokens["tpc"] = ""
    > tokens["account"] = ""
    > tokens["amt"] = tokens["amt"][0]
    > onus = tokens.onus
    > # remainder omitted out of respect for newsreaders...
    > # suffice to say that unspeakable acts are performed on
    > # onus and aux_onus fields to extract account and
    > # check numbers
    >
    > micrdata.setParseAction(parseONUS)
    >
    > testdata = file("checks.csv").readlines()[1:]
    > tests = [(flds[1],flds) for flds in map(lambda
    > l:l.split(","),testdata)]
    > def verifyResults(res,csv):
    > def match(x,y):
    > print (x==y and "_" or "X"),x,"=",y
    >
    > Ex,MICR,Bank,Stat,Amt,AS,TPC,TS,CSN,CS,ACCT,AS,EPC,ES,ONUS,OS,AUX,AS,Tran,TS
    > = csv
    > match(res.amt,Amt)
    > match(res.account,ACCT)
    > match(res.csn,CSN)
    > match(res.onus,ONUS)
    > match(res.tpc,TPC)
    > match(res.epc,EPC)
    > match(res.transit,Tran)
    >
    > for t,data in tests:
    > print t
    > try:
    > res = micrdata.parseString(t)
    > print res.dump()
    > if not(data[0] == "No"):
    > print "Passed expression that should have failed"
    > verifyResults(res,data)
    > except ParseException,pe:
    > print "<parse failed> %s" % pe.msg
    > if not(data[0] == "Yes"):
    > print "Failed expression that should have passed"
    > print



    Great, thanks for taking a look Paul. I had never tried to use
    pyparsing before. Yea, the ONUS field is crazy, don't know why there
    is no standard for it.
    mkppk, Mar 25, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. richard

    micr font codes

    richard, Jul 26, 2011, in forum: HTML
    Replies:
    8
    Views:
    894
    Mayeul
    Jul 29, 2011
  2. Simon Egginton

    LOOK! i just want to know does anyone know...

    Simon Egginton, Jul 26, 2004, in forum: Javascript
    Replies:
    3
    Views:
    176
    Dr John Stockton
    Jul 26, 2004
  3. Mike
    Replies:
    1
    Views:
    100
    Thomas 'PointedEars' Lahn
    Aug 21, 2004
  4. Andries

    I know, I know, I don't know

    Andries, Apr 23, 2004, in forum: Perl Misc
    Replies:
    3
    Views:
    228
    Gregory Toomey
    Apr 23, 2004
  5. Oliver Bleckmann

    does anyone know a working ticker/parser for stocks

    Oliver Bleckmann, Jun 10, 2005, in forum: Perl Misc
    Replies:
    1
    Views:
    74
    John Bokma
    Jun 11, 2005
Loading...

Share This Page