Class: RedAmber::DataFrame
- Inherits:
-
Object
- Object
- RedAmber::DataFrame
- Includes:
- DataFrameCombinable, DataFrameDisplayable, DataFrameIndexable, DataFrameLoadSave, DataFrameReshaping, DataFrameSelectable, DataFrameVariableOperation, Helper
- Defined in:
- lib/red_amber/data_frame.rb
Overview
Class to represent a data frame. Variable @table holds an Arrow::Table object.
Constant Summary
Constants included from DataFrameDisplayable
RedAmber::DataFrameDisplayable::INDEX_KEY
Instance Attribute Summary collapse
-
#table ⇒ Arrow::Table
(also: #to_arrow)
readonly
Returns the table having within.
Class Method Summary collapse
-
.create(table) ⇒ DataFrame
Quicker DataFrame construction from a ‘Arrow::Table`.
Instance Method Summary collapse
-
#==(other) ⇒ true, false
Compare DataFrames.
-
#each_row ⇒ Object
Enumerate for each row.
-
#empty? ⇒ true, false
Check if it is a empty DataFrame.
- #group(*group_keys, &block) ⇒ Object
-
#indices(start = 0) ⇒ Array
(also: #indexes)
Returns row indices (start…(size+start)) in a Vector.
-
#initialize(*args) ⇒ DataFrame
constructor
Creates a new DataFrame.
-
#key?(key) ⇒ Boolean
(also: #has_key?)
Returns true if self has a specified key in the argument.
-
#key_index(key) ⇒ Integer
(also: #find_index, #index)
Returns index of specified key in the Array keys.
-
#keys ⇒ Array
(also: #column_names, #var_names)
Returns an Array of keys.
- #method_missing(name, *args, &block) ⇒ Object
-
#n_keys ⇒ Integer
(also: #n_variables, #n_vars, #n_cols)
Returns the number of columns.
- #respond_to_missing?(name, include_private) ⇒ Boolean
-
#schema ⇒ Hash
Returns column name and data type in a Hash.
-
#shape ⇒ Array
Returns the numbers of rows and columns.
-
#size ⇒ Integer
(also: #n_records, #n_obs, #n_rows)
Returns the number of rows.
-
#to_a ⇒ Array
(also: #raw_records)
Returns a row-oriented array without header.
-
#to_h ⇒ Hash
Returns column-oriented data in a Hash.
-
#to_rover ⇒ Rover::DataFrame
Returns self in a ‘Rover::DataFrame`.
-
#type_classes ⇒ Array
Returns an Array of Classes of data type.
-
#types ⇒ Array
Returns abbreviated type names in an Array.
-
#variables ⇒ Hash
(also: #vars)
Returns a Hash of key and Vector pairs in the columns.
-
#vectors ⇒ Array
Returns Vectors in an Array.
Methods included from DataFrameVariableOperation
#assign, #assign_left, #drop, #pick, #rename
Methods included from DataFrameSelectable
#[], #filter, #first, #head, #last, #remove, #remove_nil, #slice, #slice_by, #tail, #take, #v
Methods included from DataFrameReshaping
#to_long, #to_wide, #transpose
Methods included from DataFrameLoadSave
Methods included from DataFrameIndexable
#map_indices, #sort, #sort_indices
Methods included from DataFrameDisplayable
#inspect, #summary, #tdr, #tdr_str, #to_iruby, #to_s
Methods included from DataFrameCombinable
#anti_join, #concatenate, #difference, #full_join, #inner_join, #intersect, #join, #left_join, #merge, #right_join, #semi_join, #set_operable?, #union
Constructor Details
#initialize(table) ⇒ DataFrame #initialize(arrowable) ⇒ DataFrame #initialize(rover_like) ⇒ DataFrame #initialize ⇒ DataFrame #initialize(empty) ⇒ DataFrame #initialize(args) ⇒ DataFrame
Creates a new DataFrame.
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
# File 'lib/red_amber/data_frame.rb', line 78 def initialize(*args) case args in nil | [nil] | [] | {} | [[]] | [{}] @table = Arrow::Table.new({}, []) in [Arrow::Table => table] @table = table in [arrowable] if arrowable.respond_to?(:to_arrow) table = arrowable.to_arrow unless table.is_a?(Arrow::Table) raise DataFrameTypeError, "to_arrow must return an Arrow::Table but #{table.class}: #{arrowable}" end @table = table in [rover_like] if rover_like.respond_to?(:to_h) begin # Accepts Rover::DataFrame @table = Arrow::Table.new(rover_like.to_h) rescue StandardError raise DataFrameTypeError, "to_h must return Arrowable object: #{rover_like}" end else begin @table = Arrow::Table.new(*args) rescue StandardError raise DataFrameTypeError, "invalid argument to create Arrow::Table: #{args}" end end name_unnamed_keys check_duplicate_keys(keys) end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args, &block) ⇒ Object
332 333 334 335 336 |
# File 'lib/red_amber/data_frame.rb', line 332 def method_missing(name, *args, &block) return v(name) if args.empty? && key?(name) super end |
Instance Attribute Details
#table ⇒ Arrow::Table (readonly) Also known as: to_arrow
Returns the table having within.
114 115 116 |
# File 'lib/red_amber/data_frame.rb', line 114 def table @table end |
Class Method Details
.create(table) ⇒ DataFrame
This method will allocate table directly and may be used in the method.
‘table` must have unique keys.
Quicker DataFrame construction from a ‘Arrow::Table`.
27 28 29 30 31 |
# File 'lib/red_amber/data_frame.rb', line 27 def self.create(table) instance = allocate instance.instance_variable_set(:@table, table) instance end |
Instance Method Details
#==(other) ⇒ true, false
Compare DataFrames.
279 280 281 |
# File 'lib/red_amber/data_frame.rb', line 279 def ==(other) other.is_a?(DataFrame) && @table == other.table end |
#each_row ⇒ Enumerator #each_row {|key_row_pairs| ... } ⇒ Object
Enumerate for each row.
305 306 307 308 309 310 311 312 313 314 315 |
# File 'lib/red_amber/data_frame.rb', line 305 def each_row return enum_for(:each_row) unless block_given? size.times do |i| key_row_pairs = vectors.each_with_object({}) do |v, h| h[v.key] = v.data[i] end yield key_row_pairs end end |
#empty? ⇒ true, false
Check if it is a empty DataFrame.
287 288 289 |
# File 'lib/red_amber/data_frame.rb', line 287 def empty? variables.empty? end |
#group(*group_keys, &block) ⇒ Object
326 327 328 329 330 |
# File 'lib/red_amber/data_frame.rb', line 326 def group(*group_keys, &block) g = Group.new(self, group_keys) g = g.summarize(&block) if block g end |
#indices(start = 0) ⇒ Array Also known as: indexes
Returns row indices (start…(size+start)) in a Vector.
237 238 239 |
# File 'lib/red_amber/data_frame.rb', line 237 def indices(start = 0) Vector.new((start..).take(size)) end |
#key?(key) ⇒ Boolean Also known as: has_key?
Returns true if self has a specified key in the argument.
177 178 179 |
# File 'lib/red_amber/data_frame.rb', line 177 def key?(key) keys.include?(key.to_sym) end |
#key_index(key) ⇒ Integer Also known as: find_index, index
Returns index of specified key in the Array keys.
188 189 190 |
# File 'lib/red_amber/data_frame.rb', line 188 def key_index(key) keys.find_index(key.to_sym) end |
#keys ⇒ Array Also known as: column_names, var_names
Returns an Array of keys.
165 166 167 |
# File 'lib/red_amber/data_frame.rb', line 165 def keys @keys || @keys = init_instance_vars(:keys) end |
#n_keys ⇒ Integer Also known as: n_variables, n_vars, n_cols
Returns the number of columns.
133 134 135 |
# File 'lib/red_amber/data_frame.rb', line 133 def n_keys @table.n_columns end |
#respond_to_missing?(name, include_private) ⇒ Boolean
338 339 340 341 342 |
# File 'lib/red_amber/data_frame.rb', line 338 def respond_to_missing?(name, include_private) return true if key?(name) super end |
#schema ⇒ Hash
Returns column name and data type in a Hash.
269 270 271 |
# File 'lib/red_amber/data_frame.rb', line 269 def schema keys.zip(types).to_h end |
#shape ⇒ Array
Returns the numbers of rows and columns.
146 147 148 |
# File 'lib/red_amber/data_frame.rb', line 146 def shape [size, n_keys] end |
#size ⇒ Integer Also known as: n_records, n_obs, n_rows
Returns the number of rows.
122 123 124 |
# File 'lib/red_amber/data_frame.rb', line 122 def size @table.n_rows end |
#to_a ⇒ Array Also known as: raw_records
If you need column-oriented array, use ‘.to_h.to_a`.
Returns a row-oriented array without header.
256 257 258 |
# File 'lib/red_amber/data_frame.rb', line 256 def to_a @table.raw_records end |
#to_h ⇒ Hash
Returns column-oriented data in a Hash.
246 247 248 |
# File 'lib/red_amber/data_frame.rb', line 246 def to_h variables.transform_values(&:to_a) end |
#to_rover ⇒ Rover::DataFrame
Returns self in a ‘Rover::DataFrame`.
321 322 323 324 |
# File 'lib/red_amber/data_frame.rb', line 321 def to_rover require 'rover' Rover::DataFrame.new(to_h) end |
#type_classes ⇒ Array
Returns an Array of Classes of data type.
210 211 212 |
# File 'lib/red_amber/data_frame.rb', line 210 def type_classes @data_types || @data_types = @table.columns.map { |column| column.data_type.class } end |
#types ⇒ Array
Returns abbreviated type names in an Array.
199 200 201 202 203 |
# File 'lib/red_amber/data_frame.rb', line 199 def types @types || @types = @table.columns.map do |column| column.data.value_type.nick.to_sym end end |
#variables ⇒ Hash Also known as: vars
Returns a Hash of key and Vector pairs in the columns.
155 156 157 |
# File 'lib/red_amber/data_frame.rb', line 155 def variables @variables || @variables = init_instance_vars(:variables) end |
#vectors ⇒ Array
Returns Vectors in an Array.
219 220 221 |
# File 'lib/red_amber/data_frame.rb', line 219 def vectors @vectors || @vectors = init_instance_vars(:vectors) end |