How to 'readline' with ByteBuffer?

I

iksrazal

Hi all,

I have to do multiple file reads to get unordered data. What I have
works but is slow:

try {
/** unordered data, require seeks */
RandomAccessFile raf_bsc = new
RandomAccessFile(myFile,"r");
String btsm_id = null;
while ((line = raf_bsc.readLine()) != null) {
if (line.indexOf("SEARCH DATA:") == -1) {
continue;
}
pointer_bsc = raf_bsc.getFilePointer();
Scanner sc = new Scanner(line);
while (sc.hasNext()) {
sc.findInLine(".+BTSM:(\\d+).*");
MatchResult result = sc.match();
btsm_id = result.group(1);
}
sc.close();

... call method to do multple file reads with seek(0)
and btsm_id

raf_bsc.seek(pointer_bsc);
}
} catch(Exception ex) { ex.printStackTrace(); }

Basically I just need:

1) seek(long);
2) Be able to run a regex on a line.

How can I do #1 and #2 in memory, via MappedByteBuffer/ByteBuffer ?

iksrazal
 
T

Thomas Hawtin

Basically I just need:

1) seek(long);
2) Be able to run a regex on a line.

Scanner is stream oriented. Like any buffered stream type class, it
probably isn't going to take nicely to having it's source moved under it.

I'd go finding the from the RandomAccessFile yourself and passing it
onto java.util.regex. The source code of scanner might help you.
How can I do #1 and #2 in memory, via MappedByteBuffer/ByteBuffer ?

If your file is in UTF-16 (LE or BE) you could use a MappedByteBuffer
wrapped as into a CharBuffer. Otherwise you are just going to make
things difficult.

Tom Hawtin
 
R

Roedy Green

RandomAccessFile raf_bsc = new
RandomAccessFile(myFile,"r");
String btsm_id = null;
while ((line = raf_bsc.readLine()) != null) {
if (line.indexOf("SEARCH DATA:") == -1) {
continue;
}
pointer_bsc = raf_bsc.getFilePointer();
Scanner sc = new Scanner(line);
while (sc.hasNext()) {
sc.findInLine(".+BTSM:(\\d+).*");
MatchResult result = sc.match();
btsm_id = result.group(1);
}
sc.close();

I am puzzled what you are doing there. readLine is something you do
with sequential files not RandomAccess. You might with a
randomAccessFile seek to a spot where a counted UTF string was
containing \n chars. RandomAccess files are rarely pure text files.
They are usually fixed length binary format records, or variable
length with some sort of external index to find the start (and
possibly length) of each record.

I need to know more about your file structure and the problem you are
trying to solve.

Another possible sort of thing you can do is seek to a spot on in the
randomaccess file, considered as a giant pool of bytes and read say
16K into into a byte[]. See http://mindprod.com/applets/fileio.html
for how. Then set up a ByteArrayReader to read the chunk or the first
bit of it, sequentially as chars or strings. Again see
http://mindprod.com/applets/fileio.html for how to do that.
 
I

iksrazal

Hi Roedy, thanks for responding,

Roedy Green escreveu:
I am puzzled what you are doing there. readLine is something you do
with sequential files not RandomAccess. You might with a
randomAccessFile seek to a spot where a counted UTF string was
containing \n chars. RandomAccess files are rarely pure text files.
They are usually fixed length binary format records, or variable
length with some sort of external index to find the start (and
possibly length) of each record.

I need to know more about your file structure and the problem you are
trying to solve.

I have a pure Text file.

Node A has a one-to-many relationship with Node B, and Node B has a one
to many relationship with Node C. I first find Node A. All the Node B's
could be anywhere in the file. Once I find the first Node B, then I
find all the Node C's that can also be anywhere in the file.

Find Node A, save file pointer via RandomAccessFile.getFilePointer() ,
call RandomAccessFile.seek(0) and find all Node B's,
repeat for node C's . I'm using regex to get the Node values, and as
such I'm thinking I need a readline. I tried wrapping in BufferedReader
but it seemingly didn't like the seeks.
Another possible sort of thing you can do is seek to a spot on in the
randomaccess file, considered as a giant pool of bytes and read say
16K into into a byte[]. See http://mindprod.com/applets/fileio.html
for how. Then set up a ByteArrayReader to read the chunk or the first
bit of it, sequentially as chars or strings. Again see
http://mindprod.com/applets/fileio.html for how to do that.

Could that work by splitting into a line, seperating by \n , and then
converting the bytes into a String? Or could I somehow parse a regex on
a ByteArray?

In the end, what I have works but is slow. My first idea is to do the
search in memory which should be faster then searching in a file.

Thanks,
iksrazal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,812
Messages
2,569,694
Members
45,478
Latest member
dontilydondon

Latest Threads

Top