Skip to content

awk

Remove duplicate lines from a file that contains a timestamp

awk '!seen[substr($0,29)]++' /opt/jira/log/atlassian-jira.log

Explanation of the command:

  • awk: This is a text processing tool that can be used to perform various text manipulation tasks.

  • !seen[substr($0,29)]++: This is an awk script that carries out the desired operation.

  • $0: This represents the entire line.

  • substr($0,27): This function extracts a substring starting from the 27th character of the line until the end of the line.
  • !seen[]++: This is an associative array named 'seen' that counts occurrences of each substring. The '!' (not) operator inverts the results of this count. When the first occurrence of a substring is encountered, !seen[] evaluates to True (1), and the line is printed. For any following occurrences, !seen[] evaluates to False (0), and the line is not printed.

  • file: This is the input file you want to apply the awk script on.

Overall, this command reads an input file line by line, extracts substrings from the 29th character onwards, and removes any duplicate lines with the same substring while maintaining the original order of the file.