I am new to linux shell, very new.
I have a fasta file from which I would like to extract a specific part from the header.
The fasta headers look like this:
lcl|CP000046.1_cds_AAW37389.1_1 [gene=dnaA] [locus_tag=SACOL0001] [protein=chromosomal replication initiator protein DnaA] [protein_id=AAW37389.1] [location=544…1905] [gbkey=CDS]
lcl|CP000046.1_cds_AAW37391.1_3 [locus_tag=SACOL0003] [protein=conserved hypothetical protein] [protein_id=AAW37391.1] [location=3697…3942] [gbkey=CDS]
I would like the ouput file to look like this:
I have been trying to use awk, with various printing options, but I could not solve it (as you may have noticed the SACOL is not always the 3rd term of the header, which does not make my life easier).
Is there a way to print only what’s after locus_tag= ?
Thank you very much for your help.