Linux sed remove duplicates and get unique values

Removing duplicates and getting unique values is quite easy provided that the input data follows a specific format, for example the string or raw data has spaces in between.

But a dilemma can occur if the data has no spaces in between the characters of the string, instead of spaces it is separated by dashes.

So, how to remove duplicates, get the unique values and still retain the format of the raw data?

Like this raw data: (just a sample string)

the-quick-brown-fox-jumps-over-the-lazy-dog-jumps-over-the-lazy-cat-jumps-over-the-rabbit

When removing duplicates and getting unique values via this command:

sort duplicates.txt | uniq (this will work if the data is separated by spaces)

duplicates.txt assumes that it has the string as illustrated above.

Sample output:

The output will be the exactly be the same with the input. Why? It is because the whole string is treated as literal one string, because the dashes connect between the character eliminating the space delimiter.

Example, if the requirement is to remove duplicates, get unique values and retain the dashes from the final output. How to do it?

Removing the duplicates and getting the unique values and still retain the same format, can be done in 3 process.

Remove the dashes from the raw data
Remove the duplicates and get unique values
Put back the dashes to the final output

Here’s the Bash Shell code:

#!/bin/sh

xfile="file_with_dashes_and_duplicates.txt"

sed 's/-/ /g' "$xfile" > "$HOME/file_with_no_dashes.txt"

cat $HOME/file_with_no_dashes.txt | xargs -n1 | sort -u | xargs > file_no_duplicate.txt

file="$HOME/file_no_duplicate.txt"

sed 's/ /-/g' "$file" > "$HOME/final_output_no_duplicates_unique_content.txt"

=======

sed 's/-/ /g' == find and substitute “-“ dash with a “/ /” or a space. g == all the character that can be found on the string

sed 's/ /-/g' == find and substitute a space with a dash, g (global) all matching characters

> file_no_duplicate.txt == IO redirection to the file specified

Output of above script to the final data:

The script:

Raw file or input data:

File or data with no dashes:

File or data with no duplicates:

The final output:

Cheers. Enjoy Linux!!!

================================

Free Android Apps:

Click links below to find out more:

Excel Keyboard guide:

https://play.google.com/store/apps/details?id=chrisjoms.myexcelapplicationguide

Heaven's Dew Fall Prayer app for Android :

https://play.google.com/store/apps/details?id=soulrefresh.beautiful.prayer

Catholic Rosary Guide for Android:

Pray the Rosary every day, countless blessings will be showered upon your life if you recite the Rosary faithfully.
https://play.google.com/store/apps/details?id=com.myrosaryapp

http://quickbytesstuff.blogspot.sg/2014/09/how-to-recite-rosary.html

Divine Mercy Chaplet Guide (A Powerful prayer) BFF = Be Filled Faith:

https://play.google.com/store/apps/details?id=com.dmercyapp

Quick Bytes IT Stuff

Search This Blog