- Joined
- Dec 14, 2021
- Messages
- 14
- Reaction score
- 0
Hello all,
I would like to see if someone has a solution for a problem that's been giving me a headache for some time now:
I have a text file with "patterns" i'd like to find and count on a target file:
FileA.txt (patterns)
apple
orange
cherry
melon
watermelon
tangerine
kiwi
FileB.txt (target)
red apples
green apples
black cherry
red cherry
green melon
watermelon
tangerine
I want to generate an output file that would read:
FileB_counts.txt
apple 2
orange 0
cherry 2
melon 2
watermelon 1
tangerine 1
kiwi 0
This is easy on excel using countif however my files are hundreds of thousands of "patterns" and millions of lines on the target file. I have to leave the computer running all night and it does it but it just takes way too much time. I'm sure there has to be a better way using either unix/bash or with a python program.
I tried using bash commands in ubuntu:
grep -cf FileA.txt FileB.txt >> output.txt
but the output file is just "8" meaning it found 8 patterns, but I really need to see which ones, how many times.
I did the "melon" and "watermelon" on purpose to show that it does not need to be exact word, it can be a string within a longer word.
I hope someone has experience and can give me some guidance on how to solve this.
Thanks!
I would like to see if someone has a solution for a problem that's been giving me a headache for some time now:
I have a text file with "patterns" i'd like to find and count on a target file:
FileA.txt (patterns)
apple
orange
cherry
melon
watermelon
tangerine
kiwi
FileB.txt (target)
red apples
green apples
black cherry
red cherry
green melon
watermelon
tangerine
I want to generate an output file that would read:
FileB_counts.txt
apple 2
orange 0
cherry 2
melon 2
watermelon 1
tangerine 1
kiwi 0
This is easy on excel using countif however my files are hundreds of thousands of "patterns" and millions of lines on the target file. I have to leave the computer running all night and it does it but it just takes way too much time. I'm sure there has to be a better way using either unix/bash or with a python program.
I tried using bash commands in ubuntu:
grep -cf FileA.txt FileB.txt >> output.txt
but the output file is just "8" meaning it found 8 patterns, but I really need to see which ones, how many times.
I did the "melon" and "watermelon" on purpose to show that it does not need to be exact word, it can be a string within a longer word.
I hope someone has experience and can give me some guidance on how to solve this.
Thanks!