Starting from .NET 6, a new DistinctBy
LINQ operator is available:
public static IEnumerable<TSource> DistinctBy<TSource,TKey> (
this IEnumerable<TSource> source,
Func<TSource,TKey> keySelector);
Returns distinct elements from a sequence according to a specified key selector function.
Usage example:
List<Item> distinctList = listWithDuplicates
.DistinctBy(i => i.Id)
.ToList();
There is also an overload that has an IEqualityComparer<TKey>
parameter.
Update in-place: In case creating a new List<T>
is not desirable, here is a RemoveDuplicates
extension method for the List<T>
class:
/// <summary>
/// Removes all the elements that are duplicates of previous elements,
/// according to a specified key selector function.
/// </summary>
/// <returns>
/// The number of elements removed.
/// </returns>
public static int RemoveDuplicates<TSource, TKey>(
this List<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> keyComparer = null)
{
ArgumentNullException.ThrowIfNull(source);
ArgumentNullException.ThrowIfNull(keySelector);
HashSet<TKey> hashSet = new(keyComparer);
return source.RemoveAll(item => !hashSet.Add(keySelector(item)));
}
This method is efficient (O(n)) but also a bit dangerous, because it is based on the potentially corruptive List<T>.RemoveAll
method¹. In case the keySelector
lambda succeeds for some elements and then fails for another element, the partially modified List<T>
will neither be restored to its initial state, nor it will be in a state recognizable as the result of successful individual Remove
s.
Instead it will transition to a corrupted state that includes duplicate occurrences of existing elements. So in case the keySelector
lambda is not
fail-proof, the RemoveDuplicates
method should be invoked in a try
block that has a catch
block where the potentially corrupted list is discarded.
Alternatively you could substitute the dangerous built-in RemoveAll
with a safe custom implementation, that offers predictable behavior.
¹ For all .NET versions and platforms, including the latest .NET 7. I have submitted a proposal on GitHub to document the corruptive behavior of the List<T>.RemoveAll
method, and the feedback that I received was that neither the behavior should be documented, nor the implementation should be fixed.