Adaptive datatype (or so...)

W

Whatever5k

Ok, I got a big problem. What I want to do is basically read from a
file. This file contains "symbols". The user specifies how much bytes
one symbols need. For example, in a a text file each symbol would need
one byte. So I want to read into an array, say buf[], were buf[0]
contains the first symbol, b[1] the second, and so on.
How do I realize that? I mean, the program has to be able to handle any
number of bytes per symbol. For example:

file content = "123456789"

example #1. bytes per symbol = 1.
In this case,
buf[0] = 1
buf[1] = 2
buf[2] = 3
etc.

example #2, bytes per symbol = 2.
In this case,
buf[0] = 12
buf[1] = 34
etc.

Any idea how to do this, guys?
 
E

Eric Sosman

Ok, I got a big problem. What I want to do is basically read from a
file. This file contains "symbols". The user specifies how much bytes
one symbols need. For example, in a a text file each symbol would need
one byte. So I want to read into an array, say buf[], were buf[0]
contains the first symbol, b[1] the second, and so on.
How do I realize that? I mean, the program has to be able to handle any
number of bytes per symbol. For example:

file content = "123456789"

example #1. bytes per symbol = 1.
In this case,
buf[0] = 1
buf[1] = 2
buf[2] = 3
etc.

example #2, bytes per symbol = 2.
In this case,
buf[0] = 12
buf[1] = 34
etc.

Any idea how to do this, guys?

Several, but I don't know how to choose among them.
What do you want to do with these "symbols" after they
have been loaded into the array?
 
W

Whatever5k

Well, I want to insert them into a binary tree. But that's not the
problem.
My problem is to have an array that contains the symbols.

Could you just choose one simple way of doing this job?
 
D

Default User

Well, I want to insert them into a binary tree. But that's not the
problem.
My problem is to have an array that contains the symbols.

Could you just choose one simple way of doing this job?


Please quote enough of the previous message for context. See how
everybody else in the group does it.


You haven't explained what you are trying to accomplish. What are these
"symbols"? How will they be used once you create the array?





Brian
 
E

Eric Sosman

Well, I want to insert them into a binary tree. But that's not the
problem.
My problem is to have an array that contains the symbols.

Please quote enough context so your message can stand on
its own. Message propagation on Usenet is both asynchronous
and uncoordinated, meaning that messages do not arrive at all
news servers at the same time or in the same order. It is
entirely possible for a reply to reach a server before the
message it replies to.
Could you just choose one simple way of doing this job?

(For those just joining: "this job" is to extract some
kind of "symbols" from a file and store them in an array.
We are told that the symbols are sometimes one character
long and sometimes two, and possibly other lengths, but
that the symbol length is fixed during any given program
execution. We are not told whether these symbols can
just be thought of as strings or are something else; we are
not told whether newlines in the file have any importance
or are just parts of symbols; we are not told very much at
all. I asked what Whatever5k wanted to do with the symbols,
because data structures exist to support the operations to
be performed on the data; without knowing what the operations
are, it is impossible to make an intelligent recommendation.
His response was as you see above, so ...)

Here's one simple way: Allocate a big array of characters
and read the entire file into it. The array will then contain
all the "symbols" from the file.
 
B

Bill Pursell

Ok, I got a big problem. What I want to do is basically read from a
file. This file contains "symbols". The user specifies how much bytes
one symbols need. For example, in a a text file each symbol would need
one byte. So I want to read into an array, say buf[], were buf[0]
contains the first symbol, b[1] the second, and so on.
How do I realize that? I mean, the program has to be able to handle any
number of bytes per symbol. For example:

It's not exactly clear what you want, but here's one
idea.

#include <stdio.h>
#include <stdlib.h>
#define MAX_LENGTH 8

void die(char *a)
{
fprintf(stderr, "%s\n", a);
exit(EXIT_FAILURE);
}


void * xmalloc(size_t size)
{
void *ret;
if ( (ret = malloc(size)) == NULL)
die("out of memory");
return ret;
}

int
main(int argc, char **argv)
{
int symbol_size;
char **symbol_array;
size_t symbol_count;
size_t array_size;
char **next_symbol;

symbol_size = (argc > 1) ? atoi(argv[1]) : 1;
if (symbol_size < 0 || symbol_size > MAX_LENGTH)
die("Invalid size");

symbol_array = xmalloc(array_size = BUFSIZ);
next_symbol = symbol_array;
*next_symbol = xmalloc(symbol_size);
symbol_count = 0;
while( fread(*next_symbol, symbol_size, 1, stdin) == 1) {
/*
* Need to check and realloc symbol_array
* if necessary! Left as an exercise.
*/
next_symbol++;
*next_symbol = xmalloc(symbol_size);
symbol_count++;
}
return EXIT_SUCCESS;
}
~
 
B

Barry Schwarz

Ok, I got a big problem. What I want to do is basically read from a
file. This file contains "symbols". The user specifies how much bytes
one symbols need. For example, in a a text file each symbol would need
one byte. So I want to read into an array, say buf[], were buf[0]
contains the first symbol, b[1] the second, and so on.
How do I realize that? I mean, the program has to be able to handle any
number of bytes per symbol. For example:

file content = "123456789"

example #1. bytes per symbol = 1.
In this case,
buf[0] = 1
buf[1] = 2
buf[2] = 3
etc.

example #2, bytes per symbol = 2.
In this case,
buf[0] = 12
buf[1] = 34
etc.

Any idea how to do this, guys?

I recommend a dynamic array of pointers to strings. Once you decide
on the number of bytes per symbol (bps), something like the following
will work (error checking of malloc omitted for brevity):

char **ptr;
int i = 0;
ptr = malloc(n * sizeof *ptr); /*for some initial quantity of
strings*/
while (/*more strings to process*/) {
ptr = malloc(bps+1);
strncpy(ptr, /*pointer to starting byte for next
symbol*/, bps);
ptr[i++][bps] = '\0';
}

You will need to include a check for i exceeding the number of
pointers ptr points to. When it does, realloc ptr to point to a
larger number and continue.

If your data is binary rather than text, you can do the same thing
with arrays of unsigned char and use memcpy instead of strncpy. The
extra space for the terminating '\0' would not be needed.



Remove del for email
 
W

Whatever5k

Thank you for all those replies.
OK, so I have a file, this can be binary or text or anything. What I
want to do is read from that file, symbol by symbol. What I mean by
symbol is just a certain amount of bytes. For example, I want to be
able to read from the size with a symbol size of 2 bytes. This would
mean that at the end I would have an array and each entry would contain
2 bytes of information from the file.
Oh and the file can also be binary. Is it more clear now? What I want
to do with the symbols later on is just count them. I want to see how
many different symbols are in the file.

Thanks.
 
W

Whatever5k

Barry, your example would not work for a binary file.
OK, here is another example. Let's say I have got a binary file, that
contains the year and month number of today. Now, this would be written
with 0 and 1, but it would look like this: 200607. Ok, we would say
that one symbol occupies 4 bytes. So 2006 would be the first symbol and
07 the second. What I want to have is an array so that ptr[0] = 2006
and ptr[1] = 07.
I don't think that would work with your examples, would it?
 
E

Eric Sosman

Barry, your example would not work for a binary file.
OK, here is another example. Let's say I have got a binary file, that
contains the year and month number of today. Now, this would be written
with 0 and 1, but it would look like this: 200607.

Your description of the data format is still unclear
(to me, anyhow). Do you mean that the file contains the
number "two hundred thousand six hundred seven" as a
binary integer in the machine's native form (probably four
or eight bytes long)? What does "look like this" mean?
Ok, we would say
that one symbol occupies 4 bytes. So 2006 would be the first symbol and
07 the second.

It sounds like 07xx would be the second, where the x's
are two more bytes. What do you mean when you say 07 is
a four-byte "symbol?"
What I want to have is an array so that ptr[0] = 2006
and ptr[1] = 07.

It seems you don't realize that C supports many different
data types, and can represent 2006 in many different ways.
Some of them are

- As an int. The number two thousand six would look like
...011111010110 in the machine, where the "..." stand
for a machine-dependent number of leading zero bits.

- As another integer type: signed or unsigned long long,
long, int, short, and so on. The value would be as above,
but perhaps with more or fewer leading zeroes.

- As a float. C doesn't prescribe any particular floating-
point format, but on many machines two thousand six would
be represented as {0.9794921875 times two to the twelfth}
and might look like 01000100111110101100000000000000 if
viewed as a sequence of bits.

- As another floating-point type: double or long double.
Again, C doesn't prescribe the exact format, but it is
likely to be somewhat like that shown for float.

- As a string of four digits followed by a fifth all-zero
byte (to mark the end of the string). On many machines
this would look like 00110010 00110000 00110000 00110110
00000000 in five consecutive memory locations.

- As a pointer to a string of the form described above.
Strings are really arrays, and C cannot manipulate arrays
as freely as it handles other kinds of objects, so it is
often desirable to store the strings "elsewhere," leave
them pretty much alone, and work with pointers to them
instead. (Especially given the confusion over the "four-
byte symbol" 07 -- if the symbols actually have different
lengths, it will be cumbersome to work with them directly
as arrays.)

Let me repeat: These are only *some* of the ways you might
represent a "symbol" in a C program. Also, these are variations
on ways to represent just *one* of your "symbols;" there are
additional decisions to be made when you choose how to manage a
collection of many of them. I hope it's clear by now that simply
saying `ptr[0] = 2006' is not an adequate description of what you
are trying to accomplish; you need to be more specific.
 
T

Tak-Shing Chan

Barry, your example would not work for a binary file.
OK, here is another example. Let's say I have got a binary file, that
contains the year and month number of today. Now, this would be written
with 0 and 1, but it would look like this: 200607.

Your description of the data format is still unclear
(to me, anyhow). Do you mean that the file contains the
number "two hundred thousand six hundred seven" as a
binary integer in the machine's native form (probably four
or eight bytes long)? What does "look like this" mean?
Ok, we would say
that one symbol occupies 4 bytes. So 2006 would be the first symbol and ^^^^^^^
07 the second.

It sounds like 07xx would be the second, where the x's
are two more bytes. What do you mean when you say 07 is
a four-byte "symbol?"
What I want to have is an array so that ptr[0] = 2006
and ptr[1] = 07.

It seems you don't realize that C supports many different
data types, and can represent 2006 in many different ways.
Some of them are

[snipped]

The OP has explicitly requested that his data type is to be
``4 bytes'' (underlined above) which according to the C standard
means an object containing 4 * CHAR_BIT bits. Therefore, the OP
was trying to say this:

char ptr[2][4] = {{'2', '0', '0', '6'}, {'0', '7'}};

when he/she wrote ``ptr[0] = 2006 and ptr[1] = 07''.

Tak-Shing
 
D

Default User

Thank you for all those replies.
OK, so I have a file, this can be binary or text or anything. What I
want to do is read from that file, symbol by symbol. What I mean by
symbol is just a certain amount of bytes. For example, I want to be
able to read from the size with a symbol size of 2 bytes. This would
mean that at the end I would have an array and each entry would
contain 2 bytes of information from the file.

So you just want to hold the raw bytes? That's fairly simple, although
I don't really see how it's useful in the larger scheme.

If I were doing this, I'd have two different functions, one for binary
files and the other for text. Presumably in the text case you will be
ignoring newline characters, but perhaps not.
Oh and the file can also be binary. Is it more clear now? What I want
to do with the symbols later on is just count them. I want to see how
many different symbols are in the file.

Personally, I wouldn't store the "symbols" if all I wanted to do was
count. I'd open the file and count.




Brian
 
W

Whatever5k

Hi Brian,

ok, you want to count, that's fine. But you have to make sure counting
the same objects as one. So, if you had a text file "12235", you would
count 3 symbols which appear once and 1 symbol that appears twice. So
how do you manage to do that?
 
D

Default User

Hi Brian,

ok, you want to count, that's fine. But you have to make sure counting
the same objects as one. So, if you had a text file "12235", you would
count 3 symbols which appear once and 1 symbol that appears twice. So
how do you manage to do that?


So you want a frequency, essentially. Now I see where you're coming
from. First thing you need is a data structure. Then an algorithm.


For the data, you need some way of representing in information about a
symbol:

struct node
{
char *symbol;
int count;
};

Then you need to store all the symbol information. A dynamic array is a
possibility, as is a linked list. Are either of those something you
know how to do?

Now you will need a routine to extract a symbol. Your symbols can have
different lengths, so you need a function that takes in the filename
(or a pointer to an open file) and the symbol symbol size.

It's relatively easy to scan the file to put together the symbol.
fgetc() or fscanf() are possibilities. Once you get a symbol assembled,
you need a routine to process the symbol. This will check the data to
see if it already has a record. If so, it increments the count,
otherwise it adds it.



Brian
 
K

Keith Thompson

Hi Brian,

ok, you want to count, that's fine. But you have to make sure counting
the same objects as one. So, if you had a text file "12235", you would
count 3 symbols which appear once and 1 symbol that appears twice. So
how do you manage to do that?

Please provide enough context so your followup makes sense on its own.
Google no longer makes this gratuitously difficult. See
<http://cfaj.freeshell.org/google/> for details; see most of the
followups in this newsgroup for examples.

Are you trying to count the number of *distinct* symbols? I don't
think you've mentioned this before.

What exactly do you mean by "symbol"? Is a "symbol" just a sequence
of some specified number of bytes in the input file?
 
D

Default User

Are you trying to count the number of distinct symbols? I don't
think you've mentioned this before.

What exactly do you mean by "symbol"? Is a "symbol" just a sequence
of some specified number of bytes in the input file?


I guess you're like me, still have trouble figuring out what the goal
is. Getting information is like pulling teeth.




Brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top