M
Matthew Moss
Note that because I am traveling tomorrow, I've posted this week's
quiz a bit early.
The three rules of Ruby Quiz 2:
1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have passed from the time on this message.
2. Support Ruby Quiz 2 by submitting ideas as often as you can! (A
permanent, new website is in the works for Ruby Quiz 2. Until then,
please visit the temporary website at
<http://matthew.moss.googlepages.com/home>.
3. Enjoy!
Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Quiz #159
Food Database
There are numerous themes we have encountered across all of the past Ruby Quiz
problems, but there are a few that come back time and time again, albeit
sometimes in disguise. I can recall a number of quizzes that were best, or most
easily, approached using pattern matching. Data searching is also a common
theme, most often accessing the large, well-known databases of vocabulary and
numbers.
This week we're going to explore another large database that you might
not be familiar with: the USDA's Nutrient Database. You can find out about
this database at:
<http://www.ars.usda.gov/services/docs.htm?docid=8964>
The current database (SR20) can be downloaded from:
<http://www.ars.usda.gov/Services/docs.htm?docid=15867>
I recommend getting the abbreviated, ASCII download (a flat-file database),
though those who want to experience the full brunt of the relational database
are welcome to download that. I will focus on the abbreviated version, since
it will serve our needs for this and future quizzes.
Opening the archive for the abbreviated database, you'll find two files:
* ABBREV.txt: this is the ASCII database
* SR20_doc.pdf: a document describing the format and content of the abbreviated
database.
(Note that SR20 now also contains a patch to the database. For the purposes of
this quiz, I am not concerned whether you apply that patch or not. If you
don't want to worry about the patch, feel free to ignore it.)
The format of the database is fairly simple; the provided document explains
the abbreviated file format beginning on page 29. To summarize, each record
is a single line and contains more than a few delimited fields. Fields are
*separated* by carets (^), and text fields are *surrounded* by tildes (~).
The file is sorted by the first field, the food's Nutrient Databank Number
(NDB). Each line provides nutrient information for 100 grams of that food.
Your task is to provide a function that will search this nutrient database
for a food and provide information about it.
def nutrient_report(food, weight=100)
# print report to stdout
end
Parameter **food** will be a string that is the food to locate. Keep in mind
that there may be multiple entries that will simply match (a la grep) the
parameter provided. You should only report on one of these foods at this time;
which one to choose is up to you. You may want to consider a metric such as
the Levenshtein Distance (http://en.wikipedia.org/wiki/Levenshtein_distance)
while comparing food names against the search string.
Parameter **weight** is the weight to measure in grams, defaulting to 100.
(Recall that the nutrient information of each record of the database is
based upon 100 grams.) Your report should output numerical information that
corresponds to the weight requested. There is information in the document
provided that explains how to adjust for weight.
The output you provide is mostly up to you, but should include as a minimum:
+ Full food name (as found in the database, not the search string)
+ Food weight (as provided to the function)
+ Nutrient values for:
- Water
- Protein
- Carbohydrates (the `Carbohydrt` field)
- Fats (sum of the fields `FA_Sat`, `FA_Mono` and `FA_Poly`)
A few more things to consider. First, the database contains information for
over 7,500 food items. That may be a lot to search and do string comparisons
on. If you find your searches going very slowly, consider caching the data
to a more search-efficient format.
Second, consider writing some tests with database integrity in mind. For
example, at a quick glance, it appears that all the food names are presented
in the database in full-caps. But if you base your search on this assumption,
you may miss at least one food (or perhaps more) in your search, as at least
one food was entered into ABBREV.txt in mixed-case. There may be other errors
in the file, so consider doing a few sanity checks on the data file before
diving into the heart of the quiz. (Feel free to post integrity test code
to the mailing list before the waiting period is up.)
Third, and finally, part of the goal here is to make available another
large, interesting database for future Ruby Quiz problems. There are plenty
of opportunities available here... meal planning is just one example.
Keep this in mind while designing your solution: we want a firm foundation
for searching this nutrient database so that future problems can focus on
examining the results of the search.
quiz a bit early.
The three rules of Ruby Quiz 2:
1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have passed from the time on this message.
2. Support Ruby Quiz 2 by submitting ideas as often as you can! (A
permanent, new website is in the works for Ruby Quiz 2. Until then,
please visit the temporary website at
<http://matthew.moss.googlepages.com/home>.
3. Enjoy!
Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Quiz #159
Food Database
There are numerous themes we have encountered across all of the past Ruby Quiz
problems, but there are a few that come back time and time again, albeit
sometimes in disguise. I can recall a number of quizzes that were best, or most
easily, approached using pattern matching. Data searching is also a common
theme, most often accessing the large, well-known databases of vocabulary and
numbers.
This week we're going to explore another large database that you might
not be familiar with: the USDA's Nutrient Database. You can find out about
this database at:
<http://www.ars.usda.gov/services/docs.htm?docid=8964>
The current database (SR20) can be downloaded from:
<http://www.ars.usda.gov/Services/docs.htm?docid=15867>
I recommend getting the abbreviated, ASCII download (a flat-file database),
though those who want to experience the full brunt of the relational database
are welcome to download that. I will focus on the abbreviated version, since
it will serve our needs for this and future quizzes.
Opening the archive for the abbreviated database, you'll find two files:
* ABBREV.txt: this is the ASCII database
* SR20_doc.pdf: a document describing the format and content of the abbreviated
database.
(Note that SR20 now also contains a patch to the database. For the purposes of
this quiz, I am not concerned whether you apply that patch or not. If you
don't want to worry about the patch, feel free to ignore it.)
The format of the database is fairly simple; the provided document explains
the abbreviated file format beginning on page 29. To summarize, each record
is a single line and contains more than a few delimited fields. Fields are
*separated* by carets (^), and text fields are *surrounded* by tildes (~).
The file is sorted by the first field, the food's Nutrient Databank Number
(NDB). Each line provides nutrient information for 100 grams of that food.
Your task is to provide a function that will search this nutrient database
for a food and provide information about it.
def nutrient_report(food, weight=100)
# print report to stdout
end
Parameter **food** will be a string that is the food to locate. Keep in mind
that there may be multiple entries that will simply match (a la grep) the
parameter provided. You should only report on one of these foods at this time;
which one to choose is up to you. You may want to consider a metric such as
the Levenshtein Distance (http://en.wikipedia.org/wiki/Levenshtein_distance)
while comparing food names against the search string.
Parameter **weight** is the weight to measure in grams, defaulting to 100.
(Recall that the nutrient information of each record of the database is
based upon 100 grams.) Your report should output numerical information that
corresponds to the weight requested. There is information in the document
provided that explains how to adjust for weight.
The output you provide is mostly up to you, but should include as a minimum:
+ Full food name (as found in the database, not the search string)
+ Food weight (as provided to the function)
+ Nutrient values for:
- Water
- Protein
- Carbohydrates (the `Carbohydrt` field)
- Fats (sum of the fields `FA_Sat`, `FA_Mono` and `FA_Poly`)
A few more things to consider. First, the database contains information for
over 7,500 food items. That may be a lot to search and do string comparisons
on. If you find your searches going very slowly, consider caching the data
to a more search-efficient format.
Second, consider writing some tests with database integrity in mind. For
example, at a quick glance, it appears that all the food names are presented
in the database in full-caps. But if you base your search on this assumption,
you may miss at least one food (or perhaps more) in your search, as at least
one food was entered into ABBREV.txt in mixed-case. There may be other errors
in the file, so consider doing a few sanity checks on the data file before
diving into the heart of the quiz. (Feel free to post integrity test code
to the mailing list before the waiting period is up.)
Third, and finally, part of the goal here is to make available another
large, interesting database for future Ruby Quiz problems. There are plenty
of opportunities available here... meal planning is just one example.
Keep this in mind while designing your solution: we want a firm foundation
for searching this nutrient database so that future problems can focus on
examining the results of the search.