Rust Collections
Rust, known for its focus on memory safety, performance, and concurrency, provides a rich set of data structures in its standard library, collectively known as "collections." These collections are essential for organizing and manipulating data efficiently in your applications. While Vec
and HashMap
are often the go-to choices for most general-purpose data storage, Rust offers a variety of other specialized collections, each with its own strengths and optimal use cases 2.
This blog post will guide you through Rust's most common collections, explaining their purpose, typical use cases, and providing practical code examples. We'll also touch upon how to use custom hashing algorithms for HashMap
and HashSet
.
Rust's Core Collections: An Overview
Rust's collections can be broadly categorized into sequences, maps, sets, and miscellaneous structures 2. Understanding their underlying implementations and performance characteristics is key to choosing the right tool for the job.
1. Vec<T>
(Vector)
A Vec<T>
is a dynamic, resizable array that can grow or shrink in size as needed 13. It stores elements contiguously in memory, offering excellent cache performance and O(1) access to elements by index 2.
-
Common Use Cases:
-
Code Example:
fn main() {
// Create a new, empty vector
let mut numbers: Vec<i32> = Vec::new();
// Add elements
numbers.push(1);
numbers.push(2);
numbers.push(3);
println!("Vector after pushes: {:?}", numbers); // Output: [1, 2, 3]
// Use the vec! macro for initial values
let mut more_numbers = vec![4, 5, 6];
println!("Vector created with macro: {:?}", more_numbers); // Output: [4, 5, 6]
// Access elements by index
println!("First element: {}", numbers[0]); // Output: 1
// Remove the last element (pop)
let last_num = numbers.pop();
println!("Popped element: {:?}", last_num); // Output: Some(3)
println!("Vector after pop: {:?}", numbers); // Output: [1, 2]
// Iterate over elements
for num in &more_numbers {
print!("{} ", num); // Output: 4 5 6
}
println!();
}
2. VecDeque<T>
(Vector Deque)
A VecDeque<T>
is a double-ended queue implemented as a ring buffer. It provides efficient insertion and removal at both the front and back of the sequence, unlike Vec
which is efficient only at the back 2.
-
Common Use Cases:
- Implementing a queue (FIFO - First-In, First-Out).
- Implementing a double-ended queue (deque).
- When you need
Vec
-like behavior but also require efficient insertions/deletions at the front 2.
-
Code Example:
use std::collections::VecDeque;
fn main() {
let mut deque: VecDeque<char> = VecDeque::new();
// Add elements to the back
deque.push_back('a');
deque.push_back('b');
println!("Deque after push_back: {:?}", deque); // Output: ['a', 'b']
// Add elements to the front
deque.push_front('x');
deque.push_front('y');
println!("Deque after push_front: {:?}", deque); // Output: ['y', 'x', 'a', 'b']
// Remove elements from the front
let front_char = deque.pop_front();
println!("Popped from front: {:?}", front_char); // Output: Some('y')
println!("Deque after pop_front: {:?}", deque); // Output: ['x', 'a', 'b']
// Remove elements from the back
let back_char = deque.pop_back();
println!("Popped from back: {:?}", back_char); // Output: Some('b')
println!("Deque after pop_back: {:?}", deque); // Output: ['x', 'a']
}
3. LinkedList<T>
A LinkedList<T>
is a doubly linked list. While it offers O(1) time complexity for insertions and deletions at either end, and efficient splitting/appending of lists, it's generally less common in Rust than Vec
or VecDeque
due to higher memory overhead and poorer cache performance for sequential access 23.
-
Common Use Cases:
- When you need to frequently split and append lists efficiently 2.
- When you require efficient insertions or deletions in the middle of a list, and you have an iterator pointing to that position.
- When
Vec
orVecDeque
's amortized reallocations are unacceptable, and you need consistent O(1) insertion/deletion at ends 2.
-
Code Example:
use std::collections::LinkedList;
fn main() {
let mut list: LinkedList<i32> = LinkedList::new();
// Add elements to the back
list.push_back(10);
list.push_back(20);
println!("List after push_back: {:?}", list); // Output: [10, 20]
// Add elements to the front
list.push_front(5);
println!("List after push_front: {:?}", list); // Output: [5, 10, 20]
// Remove elements from the front
let front_val = list.pop_front();
println!("Popped from front: {:?}", front_val); // Output: Some(5)
println!("List after pop_front: {:?}", list); // Output: [10, 20]
// Remove elements from the back
let back_val = list.pop_back();
println!("Popped from back: {:?}", back_val); // Output: Some(20)
println!("List after pop_back: {:?}", list); // Output: [10]
// Iterate over elements
for item in &list {
print!("{} ", item); // Output: 10
}
println!();
}
4. HashMap<K, V>
A HashMap<K, V>
is a key-value store that uses a hash table for efficient storage and retrieval. It offers average O(1) time complexity for insertions, deletions, and lookups, making it incredibly fast for many operations 23. The order of elements is not guaranteed.
-
Common Use Cases:
- Associating arbitrary keys with arbitrary values 2.
- Implementing caches.
- Counting occurrences of items.
- Storing configuration settings.
-
Code Example:
use std::collections::HashMap;
fn main() {
let mut scores: HashMap<String, i32> = HashMap::new();
// Insert key-value pairs
scores.insert(String::from("Alice"), 10);
scores.insert(String::from("Bob"), 20);
scores.insert(String::from("Charlie"), 15);
println!("Scores map: {:?}", scores); // Order may vary
// Get a value by key
let alice_score = scores.get("Alice");
println!("Alice's score: {:?}", alice_score); // Output: Some(10)
// Update a value
scores.insert(String::from("Alice"), 12);
println!("Alice's new score: {:?}", scores.get("Alice")); // Output: Some(12)
// Check if a key exists
if scores.contains_key("Bob") {
println!("Bob is in the map.");
}
// Iterate over key-value pairs
for (name, score) in &scores {
println!("{}: {}", name, score);
}
}
5. HashSet<T>
A HashSet<T>
is a collection of unique elements, also implemented using a hash table. It's ideal for scenarios where you need to store distinct items and quickly check for their presence, without caring about the order of elements 23. Like HashMap
, it offers average O(1) performance for insertions, deletions, and membership tests.
-
Common Use Cases:
- Storing distinct items (e.g., unique words in a document).
- Efficiently checking if an item has been seen before.
- Performing set operations like union, intersection, and difference.
-
Code Example:
use std::collections::HashSet;
fn main() {
let mut unique_numbers: HashSet<i32> = HashSet::new();
// Insert elements
unique_numbers.insert(1);
unique_numbers.insert(2);
unique_numbers.insert(3);
unique_numbers.insert(2); // Duplicate, will be ignored
println!("Unique numbers set: {:?}", unique_numbers); // Order may vary, contains {1, 2, 3}
// Check for presence
if unique_numbers.contains(&1) {
println!("1 is in the set.");
}
if !unique_numbers.contains(&4) {
println!("4 is not in the set.");
}
// Remove an element
unique_numbers.remove(&3);
println!("Set after removing 3: {:?}", unique_numbers); // Contains {1, 2}
// Iterate over elements
for num in &unique_numbers {
print!("{} ", num);
}
println!();
}
6. BTreeMap<K, V>
A BTreeMap<K, V>
is a key-value store implemented as a B-tree. Unlike HashMap
, BTreeMap
keeps its elements sorted by key. This provides O(log N) time complexity for insertions, deletions, and lookups, which is slower than HashMap
for individual operations but allows for ordered iteration and efficient range queries 2.
-
Common Use Cases:
-
Code Example:
use std::collections::BTreeMap;
fn main() {
let mut student_grades: BTreeMap<String, char> = BTreeMap::new();
// Insert key-value pairs
student_grades.insert(String::from("Alice"), 'A');
student_grades.insert(String::from("Charlie"), 'C');
student_grades.insert(String::from("Bob"), 'B');
println!("Student grades map: {:?}", student_grades); // Output: {"Alice": 'A', "Bob": 'B', "Charlie": 'C'} (sorted by key)
// Get a value by key
let bob_grade = student_grades.get("Bob");
println!("Bob's grade: {:?}", bob_grade); // Output: Some('B')
// Iterate over key-value pairs (always sorted by key)
for (name, grade) in &student_grades {
println!("{}: {}", name, grade);
}
// Get a range of entries
println!("Grades from B to C:");
for (name, grade) in student_grades.range("Bob".to_string()..="Charlie".to_string()) {
println!("{}: {}", name, grade);
}
}
7. BTreeSet<T>
A BTreeSet<T>
is a collection of unique elements, implemented as a B-tree. Similar to BTreeMap
, it keeps its elements sorted, providing O(log N) performance for insertions, deletions, and membership tests. It's the set equivalent of BTreeMap
2.
-
Common Use Cases:
- Storing distinct items that need to be kept in sorted order.
- Performing set operations on sorted data.
- Efficient range queries on sets.
-
Code Example:
use std::collections::BTreeSet;
fn main() {
let mut sorted_words: BTreeSet<String> = BTreeSet::new();
// Insert elements
sorted_words.insert(String::from("apple"));
sorted_words.insert(String::from("zebra"));
sorted_words.insert(String::from("banana"));
sorted_words.insert(String::from("apple")); // Duplicate, ignored
println!("Sorted words set: {:?}", sorted_words); // Output: {"apple", "banana", "zebra"} (sorted)
// Check for presence
if sorted_words.contains("banana") {
println!("'banana' is in the set.");
}
// Iterate over elements (always sorted)
for word in &sorted_words {
print!("{} ", word); // Output: apple banana zebra
}
println!();
}
8. BinaryHeap<T>
A BinaryHeap<T>
is a priority queue implemented as a max-heap. This means that the largest element is always at the top (root) and can be retrieved in O(1) time. Pushing and popping elements take O(log N) time 23.
-
Common Use Cases:
- Implementing a priority queue, where you always want to process the "most important" or "largest" element first 2.
- Algorithms like Dijkstra's shortest path or Prim's minimum spanning tree.
- Finding the k-largest elements in a stream of data.
-
Code Example:
use std::collections::BinaryHeap;
fn main() {
let mut heap: BinaryHeap<i32> = BinaryHeap::new();
// Push elements
heap.push(10);
heap.push(5);
heap.push(20);
heap.push(1);
println!("Heap after pushes: {:?}", heap); // Output: [20, 10, 5, 1] (internal representation, not necessarily sorted)
// Pop elements (always returns the largest)
let max1 = heap.pop();
println!("Popped max: {:?}", max1); // Output: Some(20)
println!("Heap after first pop: {:?}", heap); // Output: [10, 5, 1]
let max2 = heap.pop();
println!("Popped max: {:?}", max2); // Output: Some(10)
println!("Heap after second pop: {:?}", heap); // Output: [5, 1]
}
Using a Custom Hashing Algorithm
By default, HashMap
and HashSet
in Rust use a cryptographically secure, but potentially slower, hashing algorithm (SipHash
). For certain applications, especially those not exposed to untrusted input where hash collision attacks are not a concern, you might want to use a faster, non-cryptographic hasher.
To use a custom hashing algorithm, you need to provide an implementation of the BuildHasher
trait. A common choice for a faster non-cryptographic hash is the fnv
crate.
Steps to use a custom hasher:
-
Add the custom hasher crate to your
Cargo.toml
:[dependencies]
fnv = "1.0" -
Import and use the custom hasher:
use std::collections::HashMap;
use fnv::FnvBuildHasher;
fn main() {
// Create a HashMap using FnvBuildHasher
let mut map: HashMap<String, i32, FnvBuildHasher> = HashMap::with_hasher(FnvBuildHasher::default());
map.insert("apple".to_string(), 1);
map.insert("banana".to_string(), 2);
println!("Map with FNV hasher: {:?}", map);
// You can also create a HashSet with a custom hasher
use std::collections::HashSet;
use fnv::FnvHashSet; // FnvHashSet is a type alias for HashSet<T, FnvBuildHasher>
let mut set: FnvHashSet<i32> = FnvHashSet::default();
set.insert(10);
set.insert(20);
println!("Set with FNV hasher: {:?}", set);
}
When to consider a custom hasher:
- Performance-critical applications: If profiling shows that hashing is a bottleneck and your keys are simple types (like integers or short strings).
- Known data: When you are absolutely certain that the input data to your
HashMap
/HashSet
is not malicious and will not attempt to cause hash collisions. - Custom types: If you have a custom type that implements
Hash
andEq
, and you want to optimize its hashing behavior beyond the default.
Caution: Using a non-cryptographic hasher like FNV can make your HashMap
or HashSet
vulnerable to denial-of-service attacks if an attacker can control the keys inserted into the map. They could craft keys that cause many hash collisions, degrading performance from O(1) to O(N). Always stick with the default hasher unless you fully understand the security implications.
Conclusion
Rust's collection types provide powerful and efficient ways to manage data. While Vec
and HashMap
are excellent starting points for most tasks, understanding the unique characteristics of VecDeque
, LinkedList
, BTreeMap
, BTreeSet
, and BinaryHeap
allows you to select the most appropriate data structure for optimal performance and correctness in specific scenarios. Remember to consider the trade-offs between memory usage, access patterns, and performance characteristics when making your choice.