class
Llama::Memory
- Llama::Memory
- Reference
- Object
Overview
Modern memory management for llama.cpp contexts
This class provides a unified interface to various memory types:
- Standard KV cache (llama_kv_cache_unified)
- SWA (Sliding Window Attention) cache (llama_kv_cache_unified_iswa)
- Recurrent layer memory (llama_memory_recurrent)
- Hybrid attention/recurrent models (llama_memory_hybrid)
The Memory API replaces the deprecated KV cache API and provides better support for modern model architectures.
Defined in:
llama/memory.crllama/memory/error.cr
Constructors
-
.new(ctx : Context)
Creates a new Memory instance from a context
Instance Method Summary
-
#can_shift? : Bool
Check if memory supports shifting
-
#clear(data : Bool = false) : self
Clear memory contents
-
#seq_add(seq_id : Int32, p0 : Int32, p1 : Int32, delta : Int32) : self
Add relative position delta to tokens in sequence
-
#seq_cp(seq_id_src : Int32, seq_id_dst : Int32, p0 : Int32, p1 : Int32) : self
Copy tokens from one sequence to another
-
#seq_div(seq_id : Int32, p0 : Int32, p1 : Int32, d : Int32) : self
Divide positions of tokens in sequence by factor
-
#seq_keep(seq_id : Int32) : self
Keep only specified sequence, remove all others
-
#seq_pos_max(seq_id : Int32) : Int32
Get maximum position in sequence
-
#seq_pos_min(seq_id : Int32) : Int32
Get minimum position in sequence
-
#seq_rm(seq_id : Int32, p0 : Int32, p1 : Int32) : Bool
Remove tokens from sequence in specified position range
-
#to_unsafe : Pointer(Void)
Get raw pointer for internal use
Constructor Detail
Creates a new Memory instance from a context
Parameters:
- ctx: The context to get memory from
Raises:
- Memory::Error if memory handle cannot be obtained
Instance Method Detail
Check if memory supports shifting
Returns:
- true if shifting is supported, false otherwise
Clear memory contents
Parameters:
- data: If true, data buffers will also be cleared together with metadata (default: false)
Returns:
- self for method chaining
Add relative position delta to tokens in sequence
Parameters:
- seq_id: Sequence ID
- p0: Start position (< 0 for [0, p1])
- p1: End position (< 0 for [p0, inf))
- delta: Position delta to add
Returns:
- self for method chaining
Copy tokens from one sequence to another
Parameters:
- seq_id_src: Source sequence ID
- seq_id_dst: Destination sequence ID
- p0: Start position (< 0 for [0, p1])
- p1: End position (< 0 for [p0, inf))
Returns:
- self for method chaining
Divide positions of tokens in sequence by factor
Parameters:
- seq_id: Sequence ID
- p0: Start position (< 0 for [0, p1])
- p1: End position (< 0 for [p0, inf))
- d: Divisor (must be > 1)
Returns:
- self for method chaining
Raises:
- ArgumentError if divisor is <= 1
Keep only specified sequence, remove all others
Parameters:
- seq_id: Sequence ID to keep
Returns:
- self for method chaining
Get maximum position in sequence
All positions in the range [pos_min, pos_max] are guaranteed to be present.
Parameters:
- seq_id: Sequence ID
Returns:
- Maximum position, or -1 if sequence is empty
Get minimum position in sequence
This is typically non-zero only for SWA (Sliding Window Attention) caches. All positions in the range [pos_min, pos_max] are guaranteed to be present.
Parameters:
- seq_id: Sequence ID
Returns:
- Minimum position, or -1 if sequence is empty
Remove tokens from sequence in specified position range
Parameters:
- seq_id: Sequence ID (< 0 to match any sequence)
- p0: Start position (< 0 for [0, p1])
- p1: End position (< 0 for [p0, inf))
Returns:
- true if successful, false if partial sequence cannot be removed
Note: Removing a whole sequence never fails