No. of 'a' in a text file

James Kanze · Apr 22, 2007

Ian Collins said:

Laudable, but you have introduced at least one fresh bug, by making ab
into an unsigned int...

...without fixing the printf to match.

It's worth pointing out that this was also cross-posted to
comp.lang.c++, and that C++ has better ways of handling this
problem: in C++ code, I'd use std::deque (for the general case,
anyway) and istream, for example, to maintain a sliding two
character window in the file. In C, I'd probably simulate the
use of deque to acheive the same thing---a two character queue
is pretty easy to program. Alternatively, a simple state
machine is an efficient solution in both languages.

In C, for this specific case, I'd probably write something like:

#include <stdio.h>
#include <stdlib.h>

int
main()
{
FILE* f = fopen( "somefile.txt", "r" ) ;
if ( f == NULL ) {
fprintf( stderr, "cannot open: %s\n", "somefile.txt" ) ;
exit( 2 ) ;
}
int prev = '\0' ;
int count = 0 ;
for ( int ch = getc( f ) ; ch != EOF ; ch = getc( f ) ) {
if ( prev == 'a' && ch == 'b' ) {
++ count ;
}
prev = ch ;
}
printf( "%d\n", count ) ;
return 0 ;
}

(I think that this is 100% C. At any rate, gcc -pedantic
-std=c99 -Wall compiles it without warnings.)

pete · Apr 22, 2007

/* BEGIN new.c */

#include <stdio.h>
#include <string.h>

#define STRING "abcdef"

int main(void)
{
enum states {A = 0,B,C,D,E,F,G,H,I,J,K} waiting_for = A;
/*
** Just make sure that the enum states cover the STRING
** and you can change STRING to whatever you want
** without having to change any other lines in this program.
*/
int c;
FILE *fp;
char *letter;
char *fn = "c:/1.txt";
long unsigned count = 0;
const char* const string = STRING;

fp = fopen(fn, "r");
if (fp != NULL) {
while ((c = getc(fp)) != EOF) {
if (c == STRING[sizeof STRING - 2]) {
if (waiting_for == sizeof STRING - 2) {
++count;
}
waiting_for = A;
} else {
letter = strchr(string, c);
if (letter == NULL
|| waiting_for++ != letter - string)
{
waiting_for = A;
}
}
}
printf("No. of '%s' = %lu\n", string, count);
fclose(fp);
} else {
printf("fopen problem with %s\n", fn);
}
return 0;
}

/* END new.c */

pete · Apr 22, 2007

#define STRING "abcdef"

int main(void)
{
enum states {A = 0,B,C,D,E,F,G,H,I,J,K} waiting_for = A;
/*
** Just make sure that the enum states cover the STRING
** and you can change STRING to whatever you want
** without having to change any other lines in this program.
*/

Actually, the following line of code

letter = strchr(string, c);

prevents the program from working correctly
with strings like "baby",
that have multiple occurances of letters.

Bruce !C!+ · Apr 22, 2007

my be it will like this,
while(ch=getc(f)!=EOF)-----------while(ch=getc(f)&&ch!=EOF)

{ //do something

Gianni Mariani · Apr 22, 2007

Umesh said:
/*Calculate the no. of occurance of 'ab'. But is this one OK/
EFFICIENT?*/
#include<stdio.h>
int main(void)
{
FILE *f;
long int c,c1,ab=1;
f=fopen("c:/1.txt","r");
while ((c=getc(f))!=EOF && (c1=getc(f))!=EOF) {
if(c=='a' && c1=='b') ab++;}
fclose(f);
printf("No. of 'ab' = %ld\n",ab);
return 0;
}

That will not work, it will fail for sequences like 'xab'.

The code I attached below shows how you can create a state machine to
search for any sequence of characters. You'll need to adapt it to read
from a file (should be trivial). It's not designed for speed.

#include <map>
#include <vector>
#include <memory>
#include <cassert>

// ======== IteratorTranverser =======================================
/**
* IteratorTranverser is a template class that iterates through
* a pointer or iterator. The pointers passed to it must be valid
* for the life-time of this object.
*/

template <typename Itr, typename t_key_char>
struct IteratorTranverser
{

Itr m_from;
const Itr m_end;

IteratorTranverser( const Itr & i_from, const Itr & i_end )
: m_from( i_from ),
m_end( i_end )
{
}

bool GetChar( t_key_char & l_char )
{
if ( m_from != m_end )
{
l_char = * ( m_from ++ );
return true;
}

return false;
}

bool HasInput( bool i_wait )
{
return m_from != m_end;
}

};

// ======== CombiningTraverser ========================================
/**
*
*
*/

template <typename TraverserTypeFirst, typename TraverserTypeSecond, typename t_key_char>
struct CombiningTraverser
{

TraverserTypeFirst & m_first;
TraverserTypeSecond & m_second;
bool m_use_second;

CombiningTraverser(
TraverserTypeFirst & io_first,
TraverserTypeSecond & io_second
)
: m_first( io_first ),
m_second( io_second ),
m_use_second( false )
{
}

bool GetChar( t_key_char & l_char )
{
if ( ! m_use_second )
{
if ( m_first.GetChar( l_char ) )
{
return true;
}
m_use_second = true;
}

return m_second.GetChar( l_char );
}

bool HasInput( bool i_wait )
{
if ( ! m_use_second )
{
if ( m_first.HasInput( i_wait ) )
{
return true;
}
m_use_second = true;
}

return m_second.HasInput( i_wait );
}

};

/**
* SimpleScanner is a simple scanner generator
*/

template <typename t_key_char, typename t_result>
class SimpleScanner
{
/**
* DFA_State contains a list of transitionstransitions
*/

struct DFA_State
{
typedef std::map<t_key_char, DFA_State *> t_map_type;
typedef typename t_map_type::iterator t_iterator;
t_map_type m_transitions;

t_result m_terminal;
bool m_has_val;

DFA_State()
: m_terminal(),
m_has_val( false )
{
}

/**
* FindOrInsertTransition is used to construct the scanner
*/

DFA_State * FindOrInsertTransition( t_key_char i_char )
{
std:

air<t_iterator, bool> l_insert_result =
m_transitions.insert( typename t_map_type::value_type( i_char, 0 ) );

if ( ! l_insert_result.second )
{
return l_insert_result.first->second;
}

return l_insert_result.first->second = new DFA_State;
}

/**
* FindTransition is used to traverse the scanner
*/

DFA_State * FindTransition( t_key_char i_char )
{
t_iterator l_insert_result =
m_transitions.find( i_char );

if ( l_insert_result != m_transitions.end() )
{
return l_insert_result->second;
}

return 0;
}

};

struct DFA_Machine
{

DFA_State * m_initial_state;
DFA_State * m_current_state;
DFA_State * m_last_accept_state;
std::vector<t_key_char> m_str;

DFA_Machine( DFA_State * i_initial_state )
: m_initial_state( i_initial_state ),
m_current_state( i_initial_state ),
m_last_accept_state( 0 )
{
}

/**
* NextChar will traverse the state machine with the next
* character and return the terminal t_result if one exists.
* If i_char does not make a valid transition, o_valid
* is set to false.
*/
bool NextChar( t_key_char i_char )
{
m_str.push_back( i_char );
DFA_State * l_next_state = m_current_state->FindTransition( i_char );

if ( l_next_state )
{
m_current_state = l_next_state;

// If there is an accepting state then we
// can roll back the push-back buffer.
if ( l_next_state->m_has_val )
{
m_last_accept_state = l_next_state;
m_str.clear();

}

return true;
}

m_current_state = m_initial_state;
return false;
}

template <typename Traverser>
bool ScanStream( Traverser & io_traverser, t_result & o_result )
{
t_key_char l_char;

while ( io_traverser.GetChar( l_char ) )
{
bool i_valid;

i_valid = NextChar( l_char );

DFA_State * l_last_accept_state = m_last_accept_state;

// If there are no more transitions or the last
if ( ( ! i_valid ) || ( m_current_state->m_transitions.size() == 0 ) )
{
if ( l_last_accept_state )
{
m_last_accept_state = 0;
m_current_state = m_initial_state;
if ( l_last_accept_state->m_has_val )
{
o_result = l_last_accept_state->m_terminal;
return true;
}
}
return false;
}

// There are transitions ...
assert( m_current_state->m_transitions.size() != 0 );

// If there are transitions (true here) and this is an interactive
// scan (waiting for user input) then wait a little longer, if there
// are no accept states - wait forever (which means calling GetChar).

if ( l_last_accept_state )
{
if ( ! io_traverser.HasInput( true ) )
{
// there is no longer any pending input. We're done.
m_last_accept_state = 0;
m_current_state = m_initial_state;
o_result = l_last_accept_state->m_terminal;
return true;
}
}
}

return false;
}

template <typename TraverserType>
bool DoScan( TraverserType & io_traverser, t_result & o_result )
{
std::vector<t_key_char> l_str = std::vector<t_key_char>();
l_str.swap( m_str );

if ( l_str.size() != 0 )
{
IteratorTranverser< typename std::vector<t_key_char>::iterator, t_key_char > l_tvsr(
l_str.begin(),
l_str.end()
);

CombiningTraverser<
IteratorTranverser< typename std::vector<t_key_char>::iterator, t_key_char >,
TraverserType,
t_key_char

> l_combined( l_tvsr, io_traverser );

bool l_scanned = ScanStream( l_combined, o_result );

// may still have content locally - push that back into the
m_str.insert( m_str.end(), l_tvsr.m_from, l_tvsr.m_end );

return l_scanned;
}
else
{
return ScanStream( io_traverser, o_result );
}

return false;
}

bool HasInput( bool )
{
return m_str.size() != 0;
}

bool GetChar( t_key_char & l_char )
{
if ( m_str.size() != 0 )
{
l_char = m_str.front();
m_str.erase( m_str.begin() );
return true;
}
return false;
}

};

struct Scanner
{
DFA_State * m_initial_state;

Scanner()
: m_initial_state( new DFA_State )
{
}

DFA_Machine * NewMachine()
{
return new DFA_Machine( m_initial_state );
}

/**
* AddTerminal will add a terminal and will return the colliding
* terminal (if there is one)
*/

template <typename t_iterator>
bool AddTerminal(
int i_length,
t_iterator i_str,
const t_result & i_kd,
t_result & o_result
) {

DFA_State * l_curr_state = m_initial_state;

t_iterator l_str = i_str;

for ( int i = 0; i < i_length; ++ i )
{
DFA_State * l_next_state = l_curr_state->FindOrInsertTransition( * l_str );

++ l_str;

l_curr_state = l_next_state;
}

if ( l_curr_state->m_has_val )
{
// We have a collision !
o_result = l_curr_state->m_terminal;
return true;
}

l_curr_state->m_terminal = i_kd;
l_curr_state->m_has_val = true;

#if 0
// actually test the scanner to make sure that we decode what we expect
// to decode
std::auto_ptr<DFA_Machine> l_machine( NewMachine() );

IteratorTranverser< t_iterator > l_tvsr( i_str, i_str + i_length );

const t_result * l_kd2 = l_machine->ScanStream( l_tvsr );

// assert( l_kd2 == i_kd );

return 0;
#endif
return false;

}
};

Scanner m_scanner;

public:

struct Machine
{

DFA_Machine * m_machine;

Machine()
: m_machine( 0 )
{
}

~Machine()
{
if ( m_machine )
{
delete m_machine;
}
}

bool HasInput( bool )
{
if ( m_machine )
{
return m_machine->HasInput( false );
}
return false;
}

bool GetChar( t_key_char & l_char )
{
if ( m_machine )
{
return m_machine->GetChar( l_char );
}
return false;
}

private:

// no copies allowed
Machine( const Machine & );
Machine & operator=( const Machine & );

};

template <typename TraverserType>
bool Traverse( Machine & i_machine, TraverserType & io_traverser, t_result & o_kd )
{
DFA_Machine * l_machine = i_machine.m_machine;

if ( ! l_machine )
{
l_machine = i_machine.m_machine = m_scanner.NewMachine();
}

return l_machine->DoScan( io_traverser, o_kd );

}

bool AddTerminal(
int i_length,
const t_key_char * i_str,
const t_result & i_kd,
t_result & o_result
) {

return m_scanner.AddTerminal( i_length, i_str, i_kd, o_result );

}

bool AddTerminal(
const t_key_char * i_str,
const t_result & i_kd,
t_result & o_result
) {

return m_scanner.AddTerminal( std::strlen( i_str ), i_str, i_kd, o_result );

}

template < typename t_container >
bool AddTerminal(
const t_container i_str,
const t_result & i_kd,
t_result & o_result
) {

return m_scanner.AddTerminal( i_str.size(), i_str.begin(), i_kd, o_result );

}

}; // SimpleScanner

#include <string>
#include <iostream>
#include <ostream>
#include <istream>

class NoisyStr
{
public:
std::string m_value;

NoisyStr()
: m_value( "unassigned" )
{
}

NoisyStr( const std::string & i_value )
: m_value( i_value )
{
}

NoisyStr( const char * i_value )
: m_value( i_value )
{
}

NoisyStr( const NoisyStr & i_value )
: m_value( i_value.m_value )
{
std::cout << "Copied " << m_value << "\n";
}

NoisyStr & operator=( const NoisyStr & i_value )
{
std::cout << "Assigned " << m_value;
m_value = i_value.m_value;
std::cout << " to " << m_value << "\n";
return * this;
}

const char * c_str()
{
return m_value.c_str();
}
};

typedef std::string KeyType;

int main()
{
SimpleScanner< char, KeyType > l_scanner;

KeyType l_collision;

l_scanner.AddTerminal( "abcde", "ZZ", l_collision );
l_scanner.AddTerminal( "xyz", "YY", l_collision );
l_scanner.AddTerminal( "dx_", "DX", l_collision );

static const char l_test[] = "abcde_test_abcdx_xyz";

std::cout << "scanning " << l_test << std::endl;

IteratorTranverser< const char *, char > l_trav( l_test, l_test + sizeof( l_test ) -1 );

SimpleScanner< char, KeyType >::Machine machine;

KeyType l_result;

while (true )
{
if ( l_scanner.Traverse( machine, l_trav, l_result ) )
{
std::cout << l_result.c_str();
}
else
{
char l_ch;

if ( ! machine.GetChar( l_ch ) )
{
if ( ! l_trav.GetChar( l_ch ) )
{
break;
}
}

std::cout << l_ch;

}
}

std::cout << std::endl;

} // main

Richard Heathfield · Apr 22, 2007

Gianni Mariani said:

That will not work, it will fail for sequences like 'xab'.

The code I attached below shows how you can create a state machine to
search for any sequence of characters. You'll need to adapt it to
read
from a file (should be trivial). It's not designed for speed.

Nor for brevity - not at almost 600 lines.

Nor, alas, did gcc like it very much:

foo.c:1: map: No such file or directory
foo.c:2: vector: No such file or directory
foo.c:3: memory: No such file or directory
foo.c:4: cassert: No such file or directory
foo.c:250: unterminated character constant
foo.c:484: string: No such file or directory
foo.c:485: iostream: No such file or directory
foo.c:486: ostream: No such file or directory
foo.c:487: istream: No such file or directory
make: *** [foo.o] Error 1

Gianni Mariani · Apr 22, 2007

Richard said:
Gianni Mariani said:

Nor for brevity - not at almost 600 lines.

Nor, alas, did gcc like it very much:

....

Try naming the file with a .cpp (C++ extension) and compiling it with a
C++ compiler.

Richard Heathfield · Apr 22, 2007

Gianni Mariani said:

...

Try naming the file with a .cpp (C++ extension) and compiling it with
a C++ compiler.

No, thank you. If I want C++, I know where to find it - but I don't
expect to find it in comp.lang.c.

Army1987 · Apr 22, 2007

In C, for this specific case, I'd probably write something like: [snip]
FILE* f = fopen( "somefile.txt", "r" ) ;

What's the point of putting f so far on the right?
Anyhow, I'd write FILE *f rather than FILE* f, or else when you
write FILE* f, g you'll be surprised by results.

[snip]

exit( 2 ) ;

The behaviour of this is implementation-defined. On the DS9K,
exit(2) makes the program securely erase the whole disk on exit.
The standard way to do that is exit(EXIT_FAILURE);

int prev = '\0' ;

You can use declarations after statements only in C99. No problem
if you have a C99-compliant compiler (gcc isn't). The same for
declarations within the guard of a for loop.

Army1987 · Apr 22, 2007

Bruce !C!+ said:
my be it will like this,
while(ch=getc(f)!=EOF)-----------while(ch=getc(f)&&ch!=EOF)

{ //do something

It'll stop if it hits a null character.
Try while((ch = getc(f) != EOF)

red floyd · Apr 22, 2007

Army1987 said:
In C, for this specific case, I'd probably write something like: [snip]
FILE* f = fopen( "somefile.txt", "r" ) ;

Click to expand...

What's the point of putting f so far on the right?
Anyhow, I'd write FILE *f rather than FILE* f, or else when you
write FILE* f, g you'll be surprised by results.

Well I wouldn't write it that way. I'd write it:

FILE *f; /* or FILE* f; */
FILE *g;

Single line declarations are probably the best.

[snip]

exit( 2 ) ;

Click to expand...

The behaviour of this is implementation-defined. On the DS9K,
exit(2) makes the program securely erase the whole disk on exit.
The standard way to do that is exit(EXIT_FAILURE);

int prev = '\0' ;

Click to expand...

You can use declarations after statements only in C99. No problem
if you have a C99-compliant compiler (gcc isn't). The same for
declarations within the guard of a for loop.

Alf P. Steinbach · Apr 22, 2007

* Army1987:

In C, for this specific case, I'd probably write something like: [snip]
FILE* f = fopen( "somefile.txt", "r" ) ;

Click to expand...

What's the point of putting f so far on the right?
Anyhow, I'd write FILE *f rather than FILE* f, or else when you
write FILE* f, g you'll be surprised by results.

The problem you encounter is that it's an ungood idea to have multiple
declarators in one declaration.

That's what you shouldn't be doing.

And when you're not doing that ungood thing, writing FILE* f makes sense.

Ernie Wright · Apr 22, 2007

Umesh said:
I wanted to say that if I want to find no. of occurance of
'abcdefghijk' in a text file by modifying this program, it would be a
toilsome job. Is there any alternative? Thank you.

I've been using the C code below for awhile. It allows me to search for
any pattern of bytes, not just strings (the pattern can contain NULs, in
other words, and can be used to find number images and other patterns of
bits). The approach is

create a buffer to read blocks of the file into
fill the start of the buffer with patlen bytes
loop
fill the rest of the buffer with buflen - patlen bytes
for each byte position in the buffer
if match( buffer byte position, pattern )
return file position
endfor
/* if we're here, we didn't find a match */
copy patlen bytes from the end of the buffer to the start
endloop
return no match found

The business with moving patlen bytes from the end to the start of the
buffer is to catch cases where the match straddles a block boundary (it
starts at the end of one buflen block of bytes and finishes in the next
one).

I'd be interested in comments about the portability of this code. It
uses several standard library functions that take or return size_t's,
but I've used long or int for the arguments and variables.

/*
======================================================================
search()

Search an open file for a byte pattern.

INPUTS
fp the file to search
pos start the search at this byte position in the file
pat the bytes to search for
len the length of pat, in bytes

RESULTS
Returns the absolute file position at which the byte pattern was
first found, or the start position otherwise.

The search actually begins at pos + 1, so that repeated searches move
forward in the file, rather than repeatedly finding a match at the
same position. The pattern must be smaller than buf[].
====================================================================== */

static long search( FILE *fp, long pos, char *pat, int len )
{
static char buf[ 1024 ];
long rlen, oldpos;
int i, found = 0;

if ( len >= sizeof( buf ))
return pos;
oldpos = pos++;
if ( fseek( fp, pos, SEEK_SET ))
return oldpos;
rlen = fread( buf, 1, len, fp );
if ( rlen < len )
return oldpos;

while ( 1 ) {
rlen = fread( buf + len, 1, sizeof( buf ) - len, fp );
for ( i = 0; i < rlen + len; i++ )
if ( !memcmp( pat, &buf[ i ], len )) {
found = 1;
break;
}
if ( found || ( rlen < ( sizeof( buf ) - len ))) break;
memcpy( buf, buf + sizeof( buf ) - len, len );
pos += sizeof( buf ) - len;
}
return found ? pos + i : oldpos;
}

- Ernie http://home.comcast.net/~erniew

Keith Thompson · Apr 22, 2007

Alf P. Steinbach said:
* Army1987:

In C, for this specific case, I'd probably write something like: [snip]
FILE* f = fopen( "somefile.txt", "r" ) ;

Click to expand...

What's the point of putting f so far on the right?
Anyhow, I'd write FILE *f rather than FILE* f, or else when you
write FILE* f, g you'll be surprised by results.

Click to expand...

The problem you encounter is that it's an ungood idea to have multiple
declarators in one declaration.

That's what you shouldn't be doing.

Agreed, mostly. There are times when combining multiple declarations
in one line can make sense:

int x, y, z;
int dx, dy, dz;

or perhaps

int x, dx;
int y, dy;
int z, dz;

(though structs might be even better). But yes, in most cases it's
clearer to have one declaration per line -- and if you prefer to make
that a strict rule, I won't disagree too strongly.

And when you're not doing that ungood thing, writing FILE* f makes sense.

There I disagree. If you're going to have declarations of any
significant complexity, you need to understand why C declaration are
the way they are (declaration follows use). With that understanding,
"FILE *f" just makes more sense.

FILE *f; /* *f is a FILE */
int a[10] /* a[10] is an int (or would be if it existed */

But in simple cases like "FILE *f" / "FILE* f", it isn't all that big
a deal. Declaring it as "FILE* f" implies that f is a FILE* -- which
happens to be true in this case, but the principle doens't extend to
other forms of declarations. As long as you can keep it straight,
I'll still *prefer* to keep the "*" next to the identifier, but I can
cope with other styles.

Gianni Mariani · Apr 22, 2007

Richard said:
Gianni Mariani said:

No, thank you. If I want C++, I know where to find it - but I don't
expect to find it in comp.lang.c.

Then stop posting to comp.lang.c++

Richard Heathfield · Apr 22, 2007

Gianni Mariani said:

Then stop posting to comp.lang.c++

I'm reading this in comp.lang.c, where you posted off-topic C++ code.
Please don't do that again. Thank you.

Default User · Apr 22, 2007

Richard said:
Gianni Mariani said:

I'm reading this in comp.lang.c, where you posted off-topic C++ code.
Please don't do that again. Thank you.

I'm sorry Richard, but you're just way off-base here. When messages are
cross-posted like that, you can't use your own group's topicality to
override the other's.

Brian

Francine.Neary · Apr 22, 2007

The code I attached below shows how you can create a state machine to
search for any sequence of characters. You'll need to adapt it to read
from a file (should be trivial). It's not designed for speed.

#include <map>
#include <vector>
#include <memory>
#include <cassert>

// ======== IteratorTranverser =======================================
* IteratorTranverser is a template class that iterates through
* a pointer or iterator. The pointers passed to it must be valid
* for the life-time of this object.
*/

template <typename Itr, typename t_key_char>

[snip]

What on earth possessed you to send this to comp.lang.c? Do you
realize that C != C++?

Ian Collins · Apr 22, 2007

What on earth possessed you to send this to comp.lang.c? Do you
realize that C != C++?

Do you realise the message has been cross-posted?

Francine.Neary · Apr 22, 2007

Do you realise the message has been cross-posted?

I don't see how that makes it acceptable - he didn't even set
followups to comp.lang.c++.

Help in this program.	2	May 14, 2022
C language. work with text	3	Dec 10, 2021
WIN32 - Update Text in a Window in order to show its size in Pixels and coordinates	0	Oct 4, 2023
Command Line Arguments	0	Mar 7, 2023
[re-post] why no output	11	May 14, 2009
Working with files	1	Dec 10, 2021
a little c programming that no works!!	11	Jun 27, 2011
Windows LLDP Driver Responds With No Data	0	Mar 17, 2023

No. of 'a' in a text file

James Kanze

pete

pete

Bruce !C!+

Gianni Mariani

Richard Heathfield

Gianni Mariani

Richard Heathfield

Army1987

Army1987

red floyd

Alf P. Steinbach

Ernie Wright

Keith Thompson

Gianni Mariani

Richard Heathfield

Default User

Francine.Neary

Ian Collins

Francine.Neary

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads