URGENT help : sorting lines of a file

R

Rajshekhar

Problem :

Given a text file as input, sort the lines and store in the specified
output file. The sorting need not make any differenciation with
regards to the characters (whether it is alpha / numeric /
alphanumeric / special). It can compare them using the ASCII value.

IMP *********** The program should not hardcode the number of lines in
the file in ANY manner. In other words, the program should keep
dynamically allocating memory as it reads from the file.

IMP *********** You should *not* assume and hardcode the number of
words in the file. (a convenient usage of a huge array of structures
is strictly prohibited)


how do i make sure that , i cant hardcore the no.of lines and what
kind of data structure to use ?


please help me .........its urgent.

Thanks :)
 
B

Barry Schwarz

Problem :

Given a text file as input, sort the lines and store in the specified
output file. The sorting need not make any differenciation with
regards to the characters (whether it is alpha / numeric /
alphanumeric / special). It can compare them using the ASCII value.

IMP *********** The program should not hardcode the number of lines in
the file in ANY manner. In other words, the program should keep
dynamically allocating memory as it reads from the file.

IMP *********** You should *not* assume and hardcode the number of
words in the file. (a convenient usage of a huge array of structures
is strictly prohibited)

This makes no sense since you are not interested in the words, only in
the characters. Were the instructions to not assume the maximum
number of characters and you copied it wrong for us.
how do i make sure that , i cant hardcore the no.of lines and what
kind of data structure to use ?


please help me .........its urgent.

No it is not urgent. It is homework.

One approach is to use a linked list. Each node would contain a
pointer to char that points to a dynamically acquired buffer.

Another approach is to use a dynamic (and therefore adjustable) array
of pointers to char where each pointer points to a dynamically
acquired buffer. Hint: this approach will make it easier to use
qsort.
 
O

osmium

"Rajshekhar"
Problem :

Given a text file as input, sort the lines and store in the specified
output file. The sorting need not make any differenciation with
regards to the characters (whether it is alpha / numeric /
alphanumeric / special). It can compare them using the ASCII value.

IMP *********** The program should not hardcode the number of lines in
the file in ANY manner. In other words, the program should keep
dynamically allocating memory as it reads from the file.

IMP *********** You should *not* assume and hardcode the number of
words in the file. (a convenient usage of a huge array of structures
is strictly prohibited)


how do i make sure that , i cant hardcore the no.of lines and what
kind of data structure to use ?

Make a tree. When you get done reading and inserting the file is already
sorted! A fundamental question is "How do you handle duplicates lines?" One
answer, for a student exercise, is to assume that won't happen. In that
case each node simply corresponds to the text of a line. Write a routine to
read a line of unknown length and then put it in a properly sized array.
Each non-null leaf in the tree points at one of these arrays.

The hardest part of writing a tree is handling the removal of nodes, and you
don't have to do that in this exercise.

Modify if, as and when you encounter problems.
 
E

Ed Prochak

Remark in passing: This is too strong a condition for
any computer with finite resources.

No it isn't. What is missing is a description of how to fail when the
limit is reached. But the limit is in the OS, not in the application
program. (E.g. think of the cp command in UNIX. it is limited by the
size of the file system, but internally it contains no such limit
value.)

[remaining good advice kept]
The problem statement hints about using a dynamically-
allocated memory area and expanding it if it turns out to
be too small. Do you know how to do this? If not, review
your notes from earlier classes or ask your teacher.

One approach would be to read and store the lines[*] and
also to maintain a dynamically-allocated array of `char*', each
pointing to the start of a line. Each time the array fills up,
expand it to make room for more `char*' pointers. When you've
read all the lines, use qsort() to sort the array of `char*'
and then write out the lines in the order indicated by the
sorted array.

[*] How do you read a line of unknown length? Same idea:
Allocate some memory to hold its characters, then read and
store until you've found the '\n' at the end of the line. If
the line is too long for the memory area, expand it and keep
on reading.

Ed
 
W

Walter Roberson

Ed Prochak said:
No it isn't. What is missing is a description of how to fail when the
limit is reached. But the limit is in the OS, not in the application
program. (E.g. think of the cp command in UNIX. it is limited by the
size of the file system, but internally it contains no such limit
value.)

Any implementation of cp must stat (or fstat) the names given
as parameters, in order to determine whether those names are regular
files or are directories. That limits the cp command to be used
with filesystems whose attributes can be represented within the
defined limits of struct stat; in particular, the file size field
is type off_t, which POSIX.1 requires (section 2.5) to be
a signed arithmetic type. Unless the system implements extensions
beyond standard C, that limits file sizes to long (C89) or long long (C99).
One could have filesystems (e.g., networked file systems) whose size
limits exceeded the local off_t limitations. The limitation would
then be internal to cp (and the OS interfaces), rather than
being limited to the size of the file system.


Are you referring to the official Unix 'cp' as documented at
http://www.opengroup.org/onlinepubs/009695399/utilities/cp.html
or the 'cp' varients as commonly implemented on actual Unix and Unix-like
(e.g., Linux) systems?

If you are referring to cp as implemented on actual systems, then
support for sparse files is not uncommon, and the implementation
then gets more complicated, with lseek() calls being made to create
the "holes", again invoking compiler and OS limits rather than
necessarily hitting the filesystem limits.
 
C

CBFalconer

Rajshekhar said:
Given a text file as input, sort the lines and store in the specified
output file. The sorting need not make any differenciation with
regards to the characters (whether it is alpha / numeric /
alphanumeric / special). It can compare them using the ASCII value.

IMP *********** The program should not hardcode the number of lines in
the file in ANY manner. In other words, the program should keep
dynamically allocating memory as it reads from the file.

IMP *********** You should *not* assume and hardcode the number of
words in the file. (a convenient usage of a huge array of structures
is strictly prohibited)

how do i make sure that , i cant hardcore the no.of lines and what
kind of data structure to use ?

The easiest thing is to read lines into a linked list. Then sort
the list. If you use my ggets (see below) getting the list is dead
easy. See freverse in the published code. Sorting is also dead
easy if you use mergesort. I have published a suitable
implementation in this newsgroup in the past. Search for
'listops'.

ggets is available in source form, and released to public domain,
at:

<http://cbfalconer.home.att.net/download/ggets.zip>
 
E

Ed Prochak

Any implementation of cp must stat (or fstat) the names given
as parameters, in order to determine whether those names are regular
files or are directories. That limits the cp command to be used
with filesystems whose attributes can be represented within the
defined limits of struct stat; in particular, the file size field
is type off_t, which POSIX.1 requires (section 2.5) to be
a signed arithmetic type. Unless the system implements extensions
beyond standard C, that limits file sizes to long (C89) or long long (C99).
One could have filesystems (e.g., networked file systems) whose size
limits exceeded the local off_t limitations. The limitation would
then be internal to cp (and the OS interfaces), rather than
being limited to the size of the file system.

Exactly. it is an OS interface issue, not a WAG number.
Are you referring to the official Unix 'cp' as documented athttp://www.opengroup.org/onlinepubs/009695399/utilities/cp.html
or the 'cp' varients as commonly implemented on actual Unix and Unix-like
(e.g., Linux) systems?

If you are referring to cp as implemented on actual systems, then
support for sparse files is not uncommon, and the implementation
then gets more complicated, with lseek() calls being made to create
the "holes", again invoking compiler and OS limits rather than
necessarily hitting the filesystem limits.

The point is the limits are outside the program.

Looking back at your original remark, the issue is that the constraint
was not to put an arbitrary limit on the program, while you remarked
there has to be a limit on a computer system. Basically I think you
extended the constraint needlessly and then tried to shoot it down.

I think we all agree there are finite resopurces. At leat no one I
know of is using a Turing machine with the infinite tape. 8^)



nice quote.
ed
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top