Class: RedAmber::DataFrame

Inherits:
Object
  • Object
show all
Includes:
DataFrameCombinable, DataFrameDisplayable, DataFrameIndexable, DataFrameLoadSave, DataFrameReshaping, DataFrameSelectable, DataFrameVariableOperation, Helper
Defined in:
lib/red_amber/data_frame.rb

Overview

Class to represent a data frame. Variable @table holds an Arrow::Table object.

Constant Summary

Constants included from DataFrameDisplayable

RedAmber::DataFrameDisplayable::INDEX_KEY

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from DataFrameVariableOperation

#assign, #assign_left, #drop, #pick, #rename

Methods included from DataFrameSelectable

#[], #filter, #first, #head, #last, #remove, #remove_nil, #slice, #slice_by, #tail, #take, #v

Methods included from DataFrameReshaping

#to_long, #to_wide, #transpose

Methods included from DataFrameLoadSave

#auto_cast, included, #save

Methods included from DataFrameIndexable

#map_indices, #sort, #sort_indices

Methods included from DataFrameDisplayable

#inspect, #summary, #tdr, #tdr_str, #to_iruby, #to_s

Methods included from DataFrameCombinable

#anti_join, #concatenate, #difference, #full_join, #inner_join, #intersect, #join, #left_join, #merge, #right_join, #semi_join, #set_operable?, #union

Constructor Details

#initialize(table) ⇒ DataFrame #initialize(arrowable) ⇒ DataFrame #initialize(rover_like) ⇒ DataFrame #initializeDataFrame #initialize(empty) ⇒ DataFrame #initialize(args) ⇒ DataFrame

Creates a new DataFrame.

Overloads:

  • #initialize(table) ⇒ DataFrame

    Initialize DataFrame by an ‘Arrow::Table`

    Parameters:

    • table (Arrow::Table)

      A table to have in the DataFrame.

  • #initialize(arrowable) ⇒ DataFrame
    Note:

    ‘RedAmber::DataFrame` itself is readable by this.

    Note:

    Hash is refined to respond to ‘#to_arrow` in this class.

    Initialize DataFrame by a ‘#to_arrow` responsible object.

    Parameters:

    • arrowable (#to_arrow)

      Any object which responds to ‘#to_arrow`. `#to_arrow` must return `Arrow::Table`.

  • #initialize(rover_like) ⇒ DataFrame
    Note:

    ‘Rover::DataFrame` is readable by this.

    Initialize DataFrame by a ‘Rover::DataFrame`-like `#to_h` responsible object.

    Parameters:

    • rover_like (#to_h)

      Any object which responds to ‘#to_h`. `#to_h` must return a Hash which is convertable by `Arrow::Table.new`.

  • #initializeDataFrame

    Create empty DataFrame

    Examples:

    DataFrame.new

  • #initialize(empty) ⇒ DataFrame

    Create empty DataFrame

    Examples:

    DataFrame.new([]), DataFrame.new({}), DataFrame.new(nil)

    Parameters:

    • empty (nil, [], {})
  • #initialize(args) ⇒ DataFrame

    Parameters:



78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# File 'lib/red_amber/data_frame.rb', line 78

def initialize(*args)
  case args
  in nil | [nil] | [] | {} | [[]] | [{}]
    @table = Arrow::Table.new({}, [])
  in [Arrow::Table => table]
    @table = table
  in [arrowable] if arrowable.respond_to?(:to_arrow)
    table = arrowable.to_arrow
    unless table.is_a?(Arrow::Table)
      raise DataFrameTypeError,
            "to_arrow must return an Arrow::Table but #{table.class}: #{arrowable}"
    end
    @table = table
  in [rover_like] if rover_like.respond_to?(:to_h)
    begin
      # Accepts Rover::DataFrame
      @table = Arrow::Table.new(rover_like.to_h)
    rescue StandardError
      raise DataFrameTypeError, "to_h must return Arrowable object: #{rover_like}"
    end
  else
    begin
      @table = Arrow::Table.new(*args)
    rescue StandardError
      raise DataFrameTypeError, "invalid argument to create Arrow::Table: #{args}"
    end
  end

  name_unnamed_keys
  check_duplicate_keys(keys)
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(name, *args, &block) ⇒ Object



332
333
334
335
336
# File 'lib/red_amber/data_frame.rb', line 332

def method_missing(name, *args, &block)
  return v(name) if args.empty? && key?(name)

  super
end

Instance Attribute Details

#tableArrow::Table (readonly) Also known as: to_arrow

Returns the table having within.

Returns:

  • (Arrow::Table)

    The table within.



114
115
116
# File 'lib/red_amber/data_frame.rb', line 114

def table
  @table
end

Class Method Details

.create(table) ⇒ DataFrame

Note:

This method will allocate table directly and may be used in the method.

Note:

‘table` must have unique keys.

Quicker DataFrame construction from a ‘Arrow::Table`.

Parameters:

  • table (Arrow::Table)

    A table to have in the DataFrame.

Returns:



27
28
29
30
31
# File 'lib/red_amber/data_frame.rb', line 27

def self.create(table)
  instance = allocate
  instance.instance_variable_set(:@table, table)
  instance
end

Instance Method Details

#==(other) ⇒ true, false

Compare DataFrames.

Returns:

  • (true, false)

    True if other is a DataFrame and table is same. Otherwise return false.



279
280
281
# File 'lib/red_amber/data_frame.rb', line 279

def ==(other)
  other.is_a?(DataFrame) && @table == other.table
end

#each_rowEnumerator #each_row {|key_row_pairs| ... } ⇒ Object

Enumerate for each row.

Overloads:

  • #each_rowEnumerator

    Returns Enumerator when no block given.

    Returns:

    • (Enumerator)

      Enumerator of each rows.

  • #each_row {|key_row_pairs| ... } ⇒ Object

    Yields with key and row pairs.

    Yields:

    • (key_row_pairs)

      Yields with key and row pairs.

    Yield Parameters:

    • Key (Hash)

      and row pairs.

    Yield Returns:

    • (Integer)

      Size of the DataFrame.



305
306
307
308
309
310
311
312
313
314
315
# File 'lib/red_amber/data_frame.rb', line 305

def each_row
  return enum_for(:each_row) unless block_given?

  size.times do |i|
    key_row_pairs =
      vectors.each_with_object({}) do |v, h|
        h[v.key] = v.data[i]
      end
    yield key_row_pairs
  end
end

#empty?true, false

Check if it is a empty DataFrame.

Returns:

  • (true, false)

    True if it has no columns.



287
288
289
# File 'lib/red_amber/data_frame.rb', line 287

def empty?
  variables.empty?
end

#group(*group_keys, &block) ⇒ Object



326
327
328
329
330
# File 'lib/red_amber/data_frame.rb', line 326

def group(*group_keys, &block)
  g = Group.new(self, group_keys)
  g = g.summarize(&block) if block
  g
end

#indices(start = 0) ⇒ Array Also known as: indexes

Returns row indices (start…(size+start)) in a Vector.

Examples:

(when self.size == 5)
- indices #=> Vector[0, 1, 2, 3, 4]
- indices(1) #=> Vector[1, 2, 3, 4, 5]
- indices('a') #=> Vector['a', 'b', 'c', 'd', 'e']

Parameters:

  • start (Object) (defaults to: 0)

    Object which have ‘#succ` method.

Returns:

  • (Array)

    A Vector of row indices.



237
238
239
# File 'lib/red_amber/data_frame.rb', line 237

def indices(start = 0)
  Vector.new((start..).take(size))
end

#key?(key) ⇒ Boolean Also known as: has_key?

Returns true if self has a specified key in the argument.

Parameters:

  • key (Symbol, String)

    Key to test.

Returns:

  • (Boolean)

    Returns true if self has key in Symbol.



177
178
179
# File 'lib/red_amber/data_frame.rb', line 177

def key?(key)
  keys.include?(key.to_sym)
end

#key_index(key) ⇒ Integer Also known as: find_index, index

Returns index of specified key in the Array keys.

Parameters:

  • key (Symbol, String)

    key to know.

Returns:

  • (Integer)

    Index of key in the Array keys.



188
189
190
# File 'lib/red_amber/data_frame.rb', line 188

def key_index(key)
  keys.find_index(key.to_sym)
end

#keysArray Also known as: column_names, var_names

Returns an Array of keys.

Returns:

  • (Array)

    Keys in an Array.



165
166
167
# File 'lib/red_amber/data_frame.rb', line 165

def keys
  @keys || @keys = init_instance_vars(:keys)
end

#n_keysInteger Also known as: n_variables, n_vars, n_cols

Returns the number of columns.

Returns:

  • (Integer)

    Number of columns.



133
134
135
# File 'lib/red_amber/data_frame.rb', line 133

def n_keys
  @table.n_columns
end

#respond_to_missing?(name, include_private) ⇒ Boolean

Returns:

  • (Boolean)


338
339
340
341
342
# File 'lib/red_amber/data_frame.rb', line 338

def respond_to_missing?(name, include_private)
  return true if key?(name)

  super
end

#schemaHash

Returns column name and data type in a Hash.

Examples:

RedAmber::DataFrame.new(x: [1, 2, 3], y: %w[A B C]).schema
# => {:x=>:uint8, :y=>:string}

Returns:

  • (Hash)

    Column name and data type.



269
270
271
# File 'lib/red_amber/data_frame.rb', line 269

def schema
  keys.zip(types).to_h
end

#shapeArray

Returns the numbers of rows and columns.

Returns:

  • (Array)

    Number of rows and number of columns in an array. Same as [size, n_keys].



146
147
148
# File 'lib/red_amber/data_frame.rb', line 146

def shape
  [size, n_keys]
end

#sizeInteger Also known as: n_records, n_obs, n_rows

Returns the number of rows.

Returns:

  • (Integer)

    Number of rows.



122
123
124
# File 'lib/red_amber/data_frame.rb', line 122

def size
  @table.n_rows
end

#to_aArray Also known as: raw_records

Note:

If you need column-oriented array, use ‘.to_h.to_a`.

Returns a row-oriented array without header.

Returns:

  • (Array)

    Row-oriented data without header.



256
257
258
# File 'lib/red_amber/data_frame.rb', line 256

def to_a
  @table.raw_records
end

#to_hHash

Returns column-oriented data in a Hash.

Returns:

  • (Hash)

    A Hash of ‘key => column_in_an_array’.



246
247
248
# File 'lib/red_amber/data_frame.rb', line 246

def to_h
  variables.transform_values(&:to_a)
end

#to_roverRover::DataFrame

Returns self in a ‘Rover::DataFrame`.

Returns:

  • (Rover::DataFrame)

    A ‘Rover::DataFrame`.



321
322
323
324
# File 'lib/red_amber/data_frame.rb', line 321

def to_rover
  require 'rover'
  Rover::DataFrame.new(to_h)
end

#type_classesArray

Returns an Array of Classes of data type.

Returns:

  • (Array)

    An Array of Red Arrow data type Classes.



210
211
212
# File 'lib/red_amber/data_frame.rb', line 210

def type_classes
  @data_types || @data_types = @table.columns.map { |column| column.data_type.class }
end

#typesArray

Returns abbreviated type names in an Array.

Returns:

  • (Array)

    Abbreviated Red Arrow data type names.



199
200
201
202
203
# File 'lib/red_amber/data_frame.rb', line 199

def types
  @types || @types = @table.columns.map do |column|
    column.data.value_type.nick.to_sym
  end
end

#variablesHash Also known as: vars

Returns a Hash of key and Vector pairs in the columns.

Returns:

  • (Hash)

    ‘key => Vector` pairs for each columns.



155
156
157
# File 'lib/red_amber/data_frame.rb', line 155

def variables
  @variables || @variables = init_instance_vars(:variables)
end

#vectorsArray

Returns Vectors in an Array.

Returns:

  • (Array)

    An Array of ‘RedAmber::Vector`s.



219
220
221
# File 'lib/red_amber/data_frame.rb', line 219

def vectors
  @vectors || @vectors = init_instance_vars(:vectors)
end