URGENT help : sorting lines of a file

Rajshekhar · Aug 11, 2008

Problem :

Given a text file as input, sort the lines and store in the specified
output file. The sorting need not make any differenciation with
regards to the characters (whether it is alpha / numeric /
alphanumeric / special). It can compare them using the ASCII value.

IMP *********** The program should not hardcode the number of lines in
the file in ANY manner. In other words, the program should keep
dynamically allocating memory as it reads from the file.

IMP *********** You should *not* assume and hardcode the number of
words in the file. (a convenient usage of a huge array of structures
is strictly prohibited)

how do i make sure that , i cant hardcore the no.of lines and what
kind of data structure to use ?

please help me .........its urgent.

Thanks

Barry Schwarz · Aug 11, 2008

Problem :

Given a text file as input, sort the lines and store in the specified
output file. The sorting need not make any differenciation with
regards to the characters (whether it is alpha / numeric /
alphanumeric / special). It can compare them using the ASCII value.

IMP *********** The program should not hardcode the number of lines in
the file in ANY manner. In other words, the program should keep
dynamically allocating memory as it reads from the file.

IMP *********** You should *not* assume and hardcode the number of
words in the file. (a convenient usage of a huge array of structures
is strictly prohibited)

This makes no sense since you are not interested in the words, only in
the characters. Were the instructions to not assume the maximum
number of characters and you copied it wrong for us.

how do i make sure that , i cant hardcore the no.of lines and what
kind of data structure to use ?

please help me .........its urgent.

No it is not urgent. It is homework.

One approach is to use a linked list. Each node would contain a
pointer to char that points to a dynamically acquired buffer.

Another approach is to use a dynamic (and therefore adjustable) array
of pointers to char where each pointer points to a dynamically
acquired buffer. Hint: this approach will make it easier to use
qsort.

osmium · Aug 11, 2008

"Rajshekhar"

Problem :

Given a text file as input, sort the lines and store in the specified
output file. The sorting need not make any differenciation with
regards to the characters (whether it is alpha / numeric /
alphanumeric / special). It can compare them using the ASCII value.

IMP *********** The program should not hardcode the number of lines in
the file in ANY manner. In other words, the program should keep
dynamically allocating memory as it reads from the file.

IMP *********** You should *not* assume and hardcode the number of
words in the file. (a convenient usage of a huge array of structures
is strictly prohibited)

how do i make sure that , i cant hardcore the no.of lines and what
kind of data structure to use ?

Make a tree. When you get done reading and inserting the file is already
sorted! A fundamental question is "How do you handle duplicates lines?" One
answer, for a student exercise, is to assume that won't happen. In that
case each node simply corresponds to the text of a line. Write a routine to
read a line of unknown length and then put it in a properly sized array.
Each non-null leaf in the tree points at one of these arrays.

The hardest part of writing a tree is handling the removal of nodes, and you
don't have to do that in this exercise.

Modify if, as and when you encounter problems.

Ed Prochak · Aug 11, 2008

Remark in passing: This is too strong a condition for
any computer with finite resources.

No it isn't. What is missing is a description of how to fail when the
limit is reached. But the limit is in the OS, not in the application
program. (E.g. think of the cp command in UNIX. it is limited by the
size of the file system, but internally it contains no such limit
value.)

[remaining good advice kept]

The problem statement hints about using a dynamically-
allocated memory area and expanding it if it turns out to
be too small. Do you know how to do this? If not, review
your notes from earlier classes or ask your teacher.

One approach would be to read and store the lines[*] and
also to maintain a dynamically-allocated array of `char*', each
pointing to the start of a line. Each time the array fills up,
expand it to make room for more `char*' pointers. When you've
read all the lines, use qsort() to sort the array of `char*'
and then write out the lines in the order indicated by the
sorted array.

[*] How do you read a line of unknown length? Same idea:
Allocate some memory to hold its characters, then read and
store until you've found the '\n' at the end of the line. If
the line is too long for the memory area, expand it and keep
on reading.

Ed

Walter Roberson · Aug 11, 2008

Ed Prochak said:
No it isn't. What is missing is a description of how to fail when the
limit is reached. But the limit is in the OS, not in the application
program. (E.g. think of the cp command in UNIX. it is limited by the
size of the file system, but internally it contains no such limit
value.)

Any implementation of cp must stat (or fstat) the names given
as parameters, in order to determine whether those names are regular
files or are directories. That limits the cp command to be used
with filesystems whose attributes can be represented within the
defined limits of struct stat; in particular, the file size field
is type off_t, which POSIX.1 requires (section 2.5) to be
a signed arithmetic type. Unless the system implements extensions
beyond standard C, that limits file sizes to long (C89) or long long (C99).
One could have filesystems (e.g., networked file systems) whose size
limits exceeded the local off_t limitations. The limitation would
then be internal to cp (and the OS interfaces), rather than
being limited to the size of the file system.

Are you referring to the official Unix 'cp' as documented at
http://www.opengroup.org/onlinepubs/009695399/utilities/cp.html
or the 'cp' varients as commonly implemented on actual Unix and Unix-like
(e.g., Linux) systems?

If you are referring to cp as implemented on actual systems, then
support for sparse files is not uncommon, and the implementation
then gets more complicated, with lseek() calls being made to create
the "holes", again invoking compiler and OS limits rather than
necessarily hitting the filesystem limits.

CBFalconer · Aug 11, 2008

Rajshekhar said:
Given a text file as input, sort the lines and store in the specified
output file. The sorting need not make any differenciation with
regards to the characters (whether it is alpha / numeric /
alphanumeric / special). It can compare them using the ASCII value.

IMP *********** The program should not hardcode the number of lines in
the file in ANY manner. In other words, the program should keep
dynamically allocating memory as it reads from the file.

IMP *********** You should *not* assume and hardcode the number of
words in the file. (a convenient usage of a huge array of structures
is strictly prohibited)

how do i make sure that , i cant hardcore the no.of lines and what
kind of data structure to use ?

The easiest thing is to read lines into a linked list. Then sort
the list. If you use my ggets (see below) getting the list is dead
easy. See freverse in the published code. Sorting is also dead
easy if you use mergesort. I have published a suitable
implementation in this newsgroup in the past. Search for
'listops'.

ggets is available in source form, and released to public domain,
at:

<http://cbfalconer.home.att.net/download/ggets.zip>

Ed Prochak · Aug 12, 2008

Any implementation of cp must stat (or fstat) the names given
as parameters, in order to determine whether those names are regular
files or are directories. That limits the cp command to be used
with filesystems whose attributes can be represented within the
defined limits of struct stat; in particular, the file size field
is type off_t, which POSIX.1 requires (section 2.5) to be
a signed arithmetic type. Unless the system implements extensions
beyond standard C, that limits file sizes to long (C89) or long long (C99).
One could have filesystems (e.g., networked file systems) whose size
limits exceeded the local off_t limitations. The limitation would
then be internal to cp (and the OS interfaces), rather than
being limited to the size of the file system.

Exactly. it is an OS interface issue, not a WAG number.

Are you referring to the official Unix 'cp' as documented athttp://www.opengroup.org/onlinepubs/009695399/utilities/cp.html
or the 'cp' varients as commonly implemented on actual Unix and Unix-like
(e.g., Linux) systems?

If you are referring to cp as implemented on actual systems, then
support for sparse files is not uncommon, and the implementation
then gets more complicated, with lseek() calls being made to create
the "holes", again invoking compiler and OS limits rather than
necessarily hitting the filesystem limits.

The point is the limits are outside the program.

Looking back at your original remark, the issue is that the constraint
was not to put an arbitrary limit on the program, while you remarked
there has to be a limit on a computer system. Basically I think you
extended the constraint needlessly and then tried to shoot it down.

I think we all agree there are finite resopurces. At leat no one I
know of is using a Turing machine with the infinite tape. 8^)

nice quote.
ed

Php combine identical lines in text file	4	Oct 11, 2023
Help with importing from multiple files and printing lines in designated spot to spit out one file.	1	Jan 16, 2023
I need help in understanding these files on my phone, Could someone help me understand these files? Urgent help needed. Please help.	1	Jun 4, 2023
Having a problem in one of my assignments - NEED HELP	0	Jul 20, 2023
Consolidate several lines of a CSV file with firewall rules	0	Oct 11, 2013
Consolidate several lines of a CSV file with firewall rules	5	Oct 11, 2013
Add a text file that a user specified the name of in a program to a directory	0	Apr 28, 2022
Python point location of intersect between two lines	0	Feb 28, 2018

URGENT help : sorting lines of a file

Rajshekhar

Barry Schwarz

osmium

Ed Prochak

Walter Roberson

CBFalconer

Ed Prochak

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads