Your Location is: Home > Linux

My loop is outputting to one concatenated file instead of several individual files

From: Belgrade View: 4072 Ctat41 


I have a bunch of tsv files each with 7 columns, but I am only interested in columns 1 and 7. Each file has the format: SampleName.bam.S.txt.

Example: 7805.bam.S.txt 7806.bam.S.txt 7808.bam.S.txt etc...

I've tried two things:

1) find . -type f -name '*.S.txt' -exec cut -f 1,7 {} > {}.F \; and
2) for f in '*.S.txt';do cut -f 1,7 "$f" > "$f".F;done

What I want is my directory to now be

7805.bam.S.txt 7805.bam.S.txt.F 7806.bam.S.txt 7806.bam.S.txt.F 7808.bam.S.txt 7808.bam.S.txt.F etc...

but instead I just get

1) 7805.bam.S.txt 7806.bam.S.txt 7808.bam.S.txt etc... {}.F
2) 7805.bam.S.txt 7806.bam.S.txt 7808.bam.S.txt etc... $f.F

Where the generated file has all of the outputs written to it, but how can I get each iteration to generate a unique filename? Thanks.

Best answer

If I understand you correctly, this is how I would do it. Agree with previous answer about awk to deal with tsv/csv -- that's definitely the right tool. I just find bash syntax for iteration and variables easier to remember than awk syntax.

find . -type f -name "*.S.txt" | while read FILE;do awk -F"\t" '{print $1"\t"$7}' $FILE > $FILE.F;done

Another answer

Based on your examples, awk, might be a better candidate for this

find . -maxdepth 1 -name "*.S.txt" -exec awk -F "\t" '{ printf "%s\t%s\n",$1,$7 }' '{}' > '{}'.F \;

Find all files ending with .S.txt and then use the files that are found to execute an awk statement that sets tab as a field separator and then prints only the 1st and 7th fields separated by a tab. We redirect the output to the another file with the same name but with ".F" at the end.

Alternatively, you can print directly to the files within awk itself. You can then use groups of files from find (+) and u crease efficiency:

find . -maxdepth 1 -name "*.S.txt" -exec awk -F "\t" '{ printf "%s\t%s\n",$1,$7 >> FILENAME".F" }' '{}' +