Skip to main content

Linux sed remove duplicates and get unique values



Removing duplicates and getting unique values is quite easy provided that the input data follows a specific format, for example the string or raw data has spaces in between.

But a dilemma can occur if the data has no spaces in between the characters of the string, instead of spaces it is separated by dashes.

So, how to remove duplicates, get the unique values and still retain the format of the raw data?

Like this raw data: (just a sample string)
the-quick-brown-fox-jumps-over-the-lazy-dog-jumps-over-the-lazy-cat-jumps-over-the-rabbit

When removing duplicates and getting unique values via this command:

sort duplicates.txt | uniq (this will work if the data is separated by spaces)

duplicates.txt assumes that it has the string as illustrated above.

Sample output:


The output will be the exactly be the same with the input. Why? It is because the whole string is treated as literal one string, because the dashes connect between the character eliminating the space delimiter.

Example, if the requirement is to remove duplicates, get unique values and retain the dashes from the final output. How to do it?

Removing the duplicates and getting the unique values and still retain the same format, can be done in 3 process.
  1. Remove the dashes from the raw data
  2. Remove the duplicates and get unique values
  3. Put back the dashes to the final output


Here’s the Bash Shell code:

#!/bin/sh

xfile="file_with_dashes_and_duplicates.txt"

sed 's/-/ /g' "$xfile" > "$HOME/file_with_no_dashes.txt"

cat $HOME/file_with_no_dashes.txt | xargs -n1 | sort -u | xargs > file_no_duplicate.txt
file="$HOME/file_no_duplicate.txt"

sed 's/ /-/g' "$file" > "$HOME/final_output_no_duplicates_unique_content.txt"

=======
sed 's/-/ /g' == find and substitute “-“ dash with a “/ /” or a space. g == all the character that can be found on the string

sed 's/ /-/g' == find and substitute a space with a dash, g (global) all matching characters

> file_no_duplicate.txt == IO redirection to the file specified

 Output of  above script to the final data:

The script:

Raw file or input data:


File or data with no dashes:


File or data with no duplicates:


The final output:



Cheers. Enjoy Linux!!!

================================
Free Android Apps:

Click  links below to find out more:

Excel Keyboard guide:


Heaven's Dew Fall  Prayer app for Android :



Catholic Rosary Guide  for Android:
Pray the Rosary every day, countless blessings will be showered upon your life if you recite the Rosary faithfully. 
https://play.google.com/store/apps/details?id=com.myrosaryapp


Comments