General queries.

S

SK

Hey folks,

I have some questions.
Could someone please answer them for me.

Problem background - I am parsing a text file. I need to find the
number of times an expression occured in the file.

[Q] >>> I open ifstream. Is there a need to explicitly close the
ifstream, i think not.
[Q] >>> I need to store the count of times an expression occured in
the file. I use map<std::string, long> as the data structure. Is there
some more efficient data structure.
[Q] >>> I have numbers stored in the form of strings in the txt file
like "12366555590". I need to have a count of numbers which have a
prefix of "123". I am forced to use std::string as storage for
"12366555590". Then i do strNumber.substr(0, 3) == "123" to check if
prefix is "123". Is there an efficient way I could do using arithmetic
types.
[Q] >>> All my TU's require <string>, only some require <algorithm>,
only some require <vector> etc. Is it a good idea to keep one common
header file and include in each TU. But this way I include unnecessary
headers in some TUs. Else i can keep a have a minimalist common header
file (with say <string>) and include in all TUs. Then each TU
explicitly includes <algorithm> or <vector>, whatsoever it requires.
Which is a better approach? I think second one.
[Q] >>> I can't anticipate all kinds of input files due to which
exceptions could be thrown. I have written two global handlers in
main catch(Exception& e), catch(...). Can I somehow also know which
line in the input text file was the reason for exception.

Thank you.
 
K

Kai-Uwe Bux

SK said:
Hey folks,

I have some questions.

I have only *some* answers.
[Q] >>> I need to store the count of times an expression occured in
the file. I use map<std::string, long> as the data structure. Is there
some more efficient data structure.

Among the standard containers, this is the one designed for this type of
problem. You might want to try out different implementations of the
Standard Library to find the most efficient one.

You might also want to look into hashed versions of the map container. Some
vendors ship those an make them available in some other namespace . But
this will render your code non-portable.
[Q] >>> I have numbers stored in the form of strings in the txt file
like "12366555590". I need to have a count of numbers which have a
prefix of "123". I am forced to use std::string as storage for
"12366555590". Then i do strNumber.substr(0, 3) == "123" to check if
prefix is "123". Is there an efficient way I could do using arithmetic
types.

Not as efficient as the code that you are using. Any arithmetic would have
to do a conversion first which probably entails visiting all digits in the
string. Your code avoids that and looks only at the first three. That seems
to be the best you can do.
[Q] >>> All my TU's require <string>, only some require <algorithm>,
only some require <vector> etc. Is it a good idea to keep one common
header file and include in each TU. But this way I include unnecessary
headers in some TUs. Else i can keep a have a minimalist common header
file (with say <string>) and include in all TUs. Then each TU
explicitly includes <algorithm> or <vector>, whatsoever it requires.
Which is a better approach? I think second one.

I agree. But my C++ coding style has not really matured yet. You will want
second opinions on this one. I like my files to clearly indicate at the top
what they need.


Best regards

Kai-Uwe
 
K

Kai-Uwe Bux

Kai-Uwe Bux said:
SK said:
Hey folks,

I have some questions.

I have only *some* answers.
[Q] >>> I need to store the count of times an expression occured in
the file. I use map<std::string, long> as the data structure. Is there
some more efficient data structure.

Among the standard containers, this is the one designed for this type of
problem. You might want to try out different implementations of the
Standard Library to find the most efficient one.

You might also want to look into hashed versions of the map container.
Some vendors ship those an make them available in some other namespace .
But this will render your code non-portable.
[Q] >>> I have numbers stored in the form of strings in the txt file
like "12366555590". I need to have a count of numbers which have a
prefix of "123". I am forced to use std::string as storage for
"12366555590". Then i do strNumber.substr(0, 3) == "123" to check if
prefix is "123". Is there an efficient way I could do using arithmetic
types.

Not as efficient as the code that you are using. Any arithmetic would have
to do a conversion first which probably entails visiting all digits in the
string. Your code avoids that and looks only at the first three. That
seems to be the best you can do.

Sorry, I was not thinking properly. Does it so happen, that you have built
the std::map< std::string, long > from the other question for this file
already? If so, then there is a very fast way to obtain the head count. But
that does not involve arithmetic representations of the strings either.
Thus the answer to your question remains should remain "no".


Best regards

Kai-Uwe
 
R

Rolf Magnus

SK said:
Hey folks,

I have some questions.
Could someone please answer them for me.

Problem background - I am parsing a text file. I need to find the
number of times an expression occured in the file.

[Q] >>> I open ifstream. Is there a need to explicitly close the
ifstream, i think not.

Right. It is automatically closed when the stream object is destroyed.
[Q] >>> I need to store the count of times an expression occured in
the file. I use map<std::string, long> as the data structure. Is there
some more efficient data structure.

It seems like a good choice.
[Q] >>> I have numbers stored in the form of strings in the txt file
like "12366555590". I need to have a count of numbers which have a
prefix of "123". I am forced to use std::string as storage for
"12366555590". Then i do strNumber.substr(0, 3) == "123" to check if
prefix is "123". Is there an efficient way I could do using arithmetic
types.

I thought you are forced to use strings?
[Q] >>> All my TU's require <string>, only some require <algorithm>,
only some require <vector> etc. Is it a good idea to keep one common
header file and include in each TU. But this way I include unnecessary
headers in some TUs. Else i can keep a have a minimalist common header
file (with say <string>) and include in all TUs. Then each TU
explicitly includes <algorithm> or <vector>, whatsoever it requires.
Which is a better approach? I think second one.

IMHO, it's better to only #include the headers that you need and nothing
else.
[Q] >>> I can't anticipate all kinds of input files due to which
exceptions could be thrown. I have written two global handlers in
main catch(Exception& e),

The standard exception base class is std::exception, and by default,
streams don't throw if they get into fail state.
catch(...). Can I somehow also know which line in the input text file
was the reason for exception.

No.
 
M

Michiel Salters

[Q] >>> I open ifstream. Is there a need to explicitly close the
ifstream, i think not.

No - done by ifstream::~ifstream
[Q] >>> I need to store the count of times an expression occured in
the file. I use map<std::string, long> as the data structure. Is there
some more efficient data structure.

unordered_map said:
[Q] >>> I have numbers stored in the form of strings in the txt file
like "12366555590". I need to have a count of numbers which have a
prefix of "123". I am forced to use std::string as storage for
"12366555590". Then i do strNumber.substr(0, 3) == "123" to check if
prefix is "123". Is there an efficient way I could do using arithmetic
types.

No, and your solution is relatively expensive too. The .substr()
function creates a new std::string, and "123" is converted repeatedly
to an std::string as well (calling strlen repeatedly). The better
solution is a simple function :

inline bool isPrefix123( std::string const& ) {
return s.size() >= 3 && s[0]=='1' && s[1]=='2' && s[2]='3';
}
No temporary strings, directly checking the original characters.
[Q] >>> All my TU's require <string>, only some require <algorithm>,
only some require <vector> etc. Is it a good idea to keep one common
header file and include in each TU. But this way I include unnecessary
headers in some TUs. Else i can keep a have a minimalist common header
file (with say <string>) and include in all TUs. Then each TU
explicitly includes <algorithm> or <vector>, whatsoever it requires.
Which is a better approach? I think second one.

With precompiled headers, the common header file solution compiles faster.
Without, the minimalist solution compiler faster. There is no resulting
quality difference.
[Q] >>> I can't anticipate all kinds of input files due to which
exceptions could be thrown. I have written two global handlers in
main catch(Exception& e), catch(...). Can I somehow also know which
line in the input text file was the reason for exception.

Well, if you can't anticipate the kind of input files, you have to
deal with binary files, so the concept of a line is somewhat unclear.
However, you should know precisely what exceptions your parser emits;
you wrote it! Why not include the line number in the exception?

Regards,
Michiel Salters
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top