Reading a very large textfile into an array

M

Markus Hofmann

Hi @ all

hope someone can help me as my PC is ready to go through the
window...here it is:

I have a text file with approximately 150,000 lines of the following
format:

0007391027,000049-0458-1556-09141999,0023924296,
0007391028,001217-0671-1610-09141999,0023924302,
0007391029,004581-0671-1630-09141999,0023924313,
0007391030,001110-0433-1636-09141999,0023924317,
0007391031,007651-0665-1648-09141999,0023924320,


How can I read the file into the memory/one array.

The aim is that I can compare various strings throughout the entire
file. I have another version working but that has to re-read each line
of the text file after comparing it with the various strings of other
lines. This takes for ages and as I have approximately 700 of these
files it is not an option....
 
I

Ivan Vecerina

| I have a text file with approximately 150,000 lines of the following
| format:
|
| 0007391027,000049-0458-1556-09141999,0023924296,
| 0007391028,001217-0671-1610-09141999,0023924302,
| 0007391029,004581-0671-1630-09141999,0023924313,
| 0007391030,001110-0433-1636-09141999,0023924317,
| 0007391031,007651-0665-1648-09141999,0023924320,
|
| How can I read the file into the memory/one array.

The easy way to do so in C++ is:
#include <vector>
#include <string>
#include <fstream>
using namespace std;

....
string str;
vector<string> buf;
// eventually call reserve() or use an std::deque
while( getline( srcFile,str ) )
buf.push_back(str);

| The aim is that I can compare various strings throughout the entire
| file. I have another version working but that has to re-read each line
| of the text file after comparing it with the various strings of other
| lines. This takes for ages and as I have approximately 700 of these
| files it is not an option....

To do things efficiently, you'll probably want to decode each line
as you read it (for example into a struct with integer fields).
A lot can be done to improve performance, but the best approach
depends on the type of processing needed...

hth,
 
T

Thomas Matthews

Markus said:
Hi @ all

hope someone can help me as my PC is ready to go through the
window...here it is:

I have a text file with approximately 150,000 lines of the following
format:

0007391027,000049-0458-1556-09141999,0023924296,
0007391028,001217-0671-1610-09141999,0023924302,
0007391029,004581-0671-1630-09141999,0023924313,
0007391030,001110-0433-1636-09141999,0023924317,
0007391031,007651-0665-1648-09141999,0023924320,


How can I read the file into the memory/one array.

The aim is that I can compare various strings throughout the entire
file. I have another version working but that has to re-read each line
of the text file after comparing it with the various strings of other
lines. This takes for ages and as I have approximately 700 of these
files it is not an option....

Here is a suggestion:
Split the line int fields.
1. Open the file in binary mode.
2. Record the current file position (i.e. the beginning of the line).
3. Read in the line and extract the field you want.
4. Store the file position into a map, using the key as the index:
std::map[/* key */] = file_position;
5. Repeat for the entire file.

The above builds an index table. Use the index table when
searching. It will return the file position and you can
position the file, then read the information line.

Other than indices, you would have to reorganize your data
to get a faster search.

Search (or even post) to a generic database newsgroup.
They may have some solutions. The folks in may also have some suggestions.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
J

Jerry Coffin

Hi @ all

hope someone can help me as my PC is ready to go through the
window...here it is:

I have a text file with approximately 150,000 lines of the following
format:

0007391027,000049-0458-1556-09141999,0023924296,
0007391028,001217-0671-1610-09141999,0023924302,
0007391029,004581-0671-1630-09141999,0023924313,
0007391030,001110-0433-1636-09141999,0023924317,
0007391031,007651-0665-1648-09141999,0023924320,


How can I read the file into the memory/one array.

A memory mapped file is what you probably want, but that's not portable.

In portable code, if you want the entire file as a single string, you
can use something like:

std::ifstream infile("whatever name");
std::stringstream temp;

temp << in.rdbuf();
std::string &contents = temp.str();

Now 'contents' is the entire contents of the file. This method is
typically faster than many of the obvious alternatives like reading the
file one line at a time into a vector of strings.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top