Module: RedAmber::DataFrameCombinable
- Included in:
- DataFrame
- Defined in:
- lib/red_amber/data_frame_combinable.rb
Overview
mix-in for the class DataFrame
Instance Method Summary collapse
-
#anti_join(other, join_keys = nil, suffix: '.1') ⇒ Object
Return records of self that do not have a match in other.
-
#concatenate(*other) ⇒ DataFrame
(also: #concat, #bind_rows)
Concatenate other dataframes or tables onto the bottom of self.
-
#difference(other) ⇒ DataFrame
(also: #setdiff)
Select records appearing in self but not in other.
-
#full_join(other, join_keys = nil, suffix: '.1') ⇒ Object
(also: #outer_join)
Join another DataFrame or Table, leaving all records.
-
#inner_join(other, join_keys = nil, suffix: '.1') ⇒ Object
Join another DataFrame or Table, leaving only the matching records.
-
#intersect(other) ⇒ DataFrame
Select records appearing in both self and other.
- #join(other, join_keys = nil, type: :inner, suffix: '.1') ⇒ Object
-
#left_join(other, join_keys = nil, suffix: '.1') ⇒ Object
Join matching values to self from other.
-
#merge(*other) ⇒ DataFrame
(also: #bind_cols)
Merge other DataFrames or Tables.
-
#right_join(other, join_keys = nil, suffix: '.1') ⇒ Object
Join matching values from self to other.
-
#semi_join(other, join_keys = nil, suffix: '.1') ⇒ Object
Return records of self that have a match in other.
-
#set_operable?(other) ⇒ Boolean
Check if set operation with self and other is possible.
-
#union(other) ⇒ DataFrame
Select records appearing in self or other.
Instance Method Details
#anti_join(other, suffix: '.1') ⇒ DataFrame #anti_join(other, join_keys, suffix: '.1') ⇒ DataFrame #anti_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame
Return records of self that do not have a match in other.
-
Same as ‘#join` with `type: :left_anti`
-
A kind of filtering join.
515 516 517 |
# File 'lib/red_amber/data_frame_combinable.rb', line 515 def anti_join(other, join_keys = nil, suffix: '.1') join(other, join_keys, type: :left_anti, suffix: suffix) end |
#concatenate(*other) ⇒ DataFrame Also known as: concat, bind_rows
the ‘#types` must be same as `other#types`.
Concatenate other dataframes or tables onto the bottom of self.
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/red_amber/data_frame_combinable.rb', line 32 def concatenate(*other) case other in [] | [nil] | [[]] return self in [Array => array] # Nop else array = other end table_array = array.map do |e| case e when Arrow::Table e when DataFrame e.table else raise DataFrameArgumentError, "#{e} is not a Table or a DataFrame" end end DataFrame.create(table.concatenate(table_array)) end |
#difference(other) ⇒ DataFrame Also known as: setdiff
Select records appearing in self but not in other.
-
Same as ‘#join` with `type: :left_anti` when keys in self are same with other.
-
A kind of set operations.
603 604 605 606 607 608 609 |
# File 'lib/red_amber/data_frame_combinable.rb', line 603 def difference(other) unless keys == other.keys.map(&:to_sym) raise DataFrameArgumentError, 'keys are not same with self and other' end join(other, keys, type: :left_anti) end |
#full_join(other, suffix: '.1') ⇒ DataFrame #full_join(other, join_keys, suffix: '.1') ⇒ DataFrame #full_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame Also known as: outer_join
Join another DataFrame or Table, leaving all records.
-
Same as ‘#join` with `type: :full_outer`
-
A kind of mutating join.
304 305 306 |
# File 'lib/red_amber/data_frame_combinable.rb', line 304 def full_join(other, join_keys = nil, suffix: '.1') join(other, join_keys, type: :full_outer, suffix: suffix) end |
#inner_join(other, suffix: '.1') ⇒ DataFrame #inner_join(other, join_keys, suffix: '.1') ⇒ DataFrame #inner_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame
Join another DataFrame or Table, leaving only the matching records.
-
Same as ‘#join` with `type: :inner`
-
A kind of mutating join.
247 248 249 |
# File 'lib/red_amber/data_frame_combinable.rb', line 247 def inner_join(other, join_keys = nil, suffix: '.1') join(other, join_keys, type: :inner, suffix: suffix) end |
#intersect(other) ⇒ DataFrame
Select records appearing in both self and other.
-
Same as ‘#join` with `type: :inner` when keys in self are same with other.
-
A kind of set operations.
547 548 549 550 551 552 553 |
# File 'lib/red_amber/data_frame_combinable.rb', line 547 def intersect(other) unless keys == other.keys.map(&:to_sym) raise DataFrameArgumentError, 'keys are not same with self and other' end join(other, keys, type: :inner) end |
#join(other, type: :inner, suffix: '.1') ⇒ DataFrame #join(other, join_keys, type: :inner, suffix: '.1') ⇒ DataFrame #join(other, join_key_pairs, type: :inner, suffix: '.1') ⇒ DataFrame
the order of joined results may not preserved. Use additional index column to sort after joining.
724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 |
# File 'lib/red_amber/data_frame_combinable.rb', line 724 def join(other, join_keys = nil, type: :inner, suffix: '.1') case other when DataFrame other = other.table when Arrow::Table # Nop else raise DataFrameArgumentError, 'other must be a DataFrame or an Arrow::Table' end table_keys = table.keys other_keys = other.keys type = type.to_sym # natural keys (implicit common keys) join_keys ||= table_keys.intersection(other_keys) # This is not necessary if additional procedure is contributed to Red Arrow. if join_keys.is_a?(Hash) left_keys = join_keys[:left] right_keys = join_keys[:right] else left_keys = join_keys right_keys = join_keys end left_keys = Array(left_keys).map(&:to_s) right_keys = Array(right_keys).map(&:to_s) case type when :full_outer, :left_semi, :left_anti, :right_semi, :right_anti left_outputs = nil right_outputs = nil when :inner, :left_outer left_outputs = table_keys right_outputs = other_keys - right_keys when :right_outer left_outputs = table_keys - left_keys right_outputs = other_keys end # Should we rescue errors in Arrow::Table#join for usability ? joined_table = table.join(other, join_keys, type: type, left_outputs: left_outputs, right_outputs: right_outputs) case type when :inner, :left_outer, :left_semi, :left_anti, :right_semi, :right_anti if joined_table.keys.uniq! DataFrame.create(rename_table(joined_table, n_keys, suffix)) else DataFrame.create(joined_table) end when :full_outer renamed_table = rename_table(joined_table, n_keys, suffix) renamed_keys = renamed_table.keys dropper = [] DataFrame.create(renamed_table).assign do |df| left_keys.map do |left_key| i_left_key = renamed_keys.index(left_key) right_key = renamed_keys[i_left_key + table_keys.size] dropper << right_key [left_key.to_sym, merge_array(df[left_key].data, df[right_key].data)] end end.drop(dropper) when :right_outer if joined_table.keys.uniq! DataFrame.create(rename_table(joined_table, left_outputs.size, suffix)) else DataFrame.create(joined_table) end.pick do [right_keys, keys.map(&:to_s) - right_keys] end end end |
#left_join(other, suffix: '.1') ⇒ DataFrame #left_join(other, join_keys, suffix: '.1') ⇒ DataFrame #left_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame
Join matching values to self from other.
-
Same as ‘#join` with `type: :left_outer`
-
A kind of mutating join.
360 361 362 |
# File 'lib/red_amber/data_frame_combinable.rb', line 360 def left_join(other, join_keys = nil, suffix: '.1') join(other, join_keys, type: :left_outer, suffix: suffix) end |
#merge(*other) ⇒ DataFrame Also known as: bind_cols
the ‘#size` must be same as `other#size`.
self and other must not share the same key.
Merge other DataFrames or Tables.
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
# File 'lib/red_amber/data_frame_combinable.rb', line 79 def merge(*other) case other in [] | [nil] | [[]] return self in [Array => array] # Nop else array = other end hash = array.each_with_object({}) do |e, h| df = case e when Arrow::Table DataFrame.create(e) when DataFrame e else raise DataFrameArgumentError, "#{e} is not a Table or a DataFrame" end if size != df.size raise DataFrameArgumentError, "#{e} do not have same size as self" end k = keys.intersection(df.keys).any? raise DataFrameArgumentError, "There are some shared keys: #{k}" if k h.merge!(df.to_h) end assign(hash) end |
#right_join(other, suffix: '.1') ⇒ DataFrame #right_join(other, join_keys, suffix: '.1') ⇒ DataFrame #right_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame
Join matching values from self to other.
-
Same as ‘#join` with `type: :right_outer`
-
A kind of mutating join.
414 415 416 |
# File 'lib/red_amber/data_frame_combinable.rb', line 414 def right_join(other, join_keys = nil, suffix: '.1') join(other, join_keys, type: :right_outer, suffix: suffix) end |
#semi_join(other, suffix: '.1') ⇒ DataFrame #semi_join(other, join_keys, suffix: '.1') ⇒ DataFrame #semi_join(other, join_key_pairs, suffix: '.1') ⇒ DataFrame
Return records of self that have a match in other.
-
Same as ‘#join` with `type: :left_semi`
-
A kind of filtering join.
467 468 469 |
# File 'lib/red_amber/data_frame_combinable.rb', line 467 def semi_join(other, join_keys = nil, suffix: '.1') join(other, join_keys, type: :left_semi, suffix: suffix) end |
#set_operable?(other) ⇒ Boolean
Check if set operation with self and other is possible.
529 530 531 |
# File 'lib/red_amber/data_frame_combinable.rb', line 529 def set_operable?(other) # rubocop:disable Naming/AccessorMethodName keys == other.keys.map(&:to_sym) end |
#union(other) ⇒ DataFrame
Select records appearing in self or other.
-
Same as ‘#join` with `type: :full_outer` when keys in self are same with other.
-
A kind of set operations.
573 574 575 576 577 578 579 |
# File 'lib/red_amber/data_frame_combinable.rb', line 573 def union(other) unless keys == other.keys.map(&:to_sym) raise DataFrameArgumentError, 'keys are not same with self and other' end join(other, keys, type: :full_outer) end |