Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for bill of the House of Representatives of Japan #184

Merged

Conversation

tikkss
Copy link
Contributor

@tikkss tikkss commented Dec 11, 2023

Fixes #142.

Notes for reviewers

Wareki can't parse to Date object (e.g., "平成10年 3月 4日"). I think the following options.

  1. Introduce wareki gem
  2. Parse independently method (Not implemented)
  3. Keep it as a String (Not parsed)

@kou
Copy link
Member

kou commented Dec 12, 2023

The wareki gem will be convenient but it's overkill for us. Because it changes Date.

Parse independently method (Not implemented)

Can we try this? Do we need to support only 昭和, 平成 and 令和?

@kou
Copy link
Member

kou commented Dec 12, 2023

Parse independently method (Not implemented)

Can we try this? Do we need to support only 昭和, 平成 and 令和?

We can defer this to a follow-up task.

@tikkss
Copy link
Contributor Author

tikkss commented Dec 12, 2023

Thanks for your review!

The wareki gem will be convenient but it's overkill for us. Because it changes Date.

I agree.

I would like to implement independently method only support 昭和, 平成, 令和.

The fields capable of accepting dates only contained Heisei and Reiwa.
This fact was confirmed with the following script on 24e251d.

```ruby
require "datasets"

house_of_representative = Datasets::HouseOfRepresentative.new
japanese_date_strings = house_of_representative.collect { |record|
  [
    record.house_of_representatives_of_accepted_bill_on_preliminary_consideration,
    record.house_of_representatives_of_preliminary_refer_on,
    record.house_of_representatives_of_accepted_bill_on,
    record.house_of_representatives_of_refer_on,
    record.house_of_representatives_of_finished_consideration_on,
    record.house_of_representatives_of_finished_deliberation_on,
    record.house_of_councillors_of_accepted_bill_on_preliminary_consideration,
    record.house_of_councillors_of_preliminary_refer_on,
    record.house_of_councillors_of_accepted_bill_on,
    record.house_of_councillors_of_refer_on,
    record.house_of_councillors_of_finished_consideration_on,
    record.house_of_councillors_of_finished_deliberation_on,
    record.promulgated_on,
  ].reject { |s| s.nil? || s == "" }
}.flatten
japanese_date_strings.collect { |s| s[0..1] }
                     .uniq
# => ["平成", "令和"]
```

The private method `parse_japanese_date` was asserted correctly with the
following script.

```ruby
require "date"
require "test-unit"

JAPANESE_DATE_MATCHER = /\A(平成|令和)\s*(\d{1,2}|元)年\s*\d{1,2}月\s*\d{1,2}日\z/.freeze

def parse_japanese_date(field)
  return field unless field.to_s.match?(JAPANESE_DATE_MATCHER)
  Date.parse(normalize_jisx0301_format(field))
end

def normalize_jisx0301_format(japanese_date_string)
  japanese_date_string.gsub(/平成/, "H")
                      .gsub(/令和/, "R")
                      .gsub(/元/, "01")
                      .gsub(/\s/, "0")
                      .gsub(/[年月]/, ".")
                      .gsub(/[日]/, "")
end

class ParseJapaneseDateTest < Test::Unit::TestCase
  data("month and day with leading a space in Heisei", ["H10.01.01", "平成10年 1月 1日"])
  data("month         with leading a space in Heisei", ["H10.01.10", "平成10年 1月10日"])
  data("          day with leading a space in Heisei", ["H10.10.01", "平成10年10月 1日"])
  data("           without leading a space in Heisei", ["H10.10.10", "平成10年10月10日"])
  data("year, month and day with leading a space in Reiwa", ["R02.01.01", "令和 2年 1月 1日"])
  data("year, month         with leading a space in Reiwa", ["R02.01.10", "令和 2年 1月10日"])
  data("year,           day with leading a space in Reiwa", ["R02.10.01", "令和 2年10月 1日"])
  data("year,            without leading a space in Reiwa", ["R02.10.10", "令和 2年10月10日"])
  data("boundary within Heisei", ["H31.04.30", "平成31年 4月30日"])
  data("boundary within Reiwa", ["R01.05.01", "令和元年 5月 1日"])
  def test_parse_japanse_date(data)
    expected_jisx0301, japanese_date_string = data
    assert_equal(expected_jisx0301, parse_japanese_date(japanese_date_string).jisx0301)
  end
end

Test::Unit::AutoRunner.run
# Loaded suite irb
# Started
# Finished in 0.002313 seconds.
# -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# 10 tests, 10 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
# 100% passed
# -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# 4323.39 tests/s, 4323.39 assertions/s
# => true
```
@tikkss tikkss force-pushed the add-house-of-representatives-of-japan branch from a37b13e to 1e15280 Compare December 16, 2023 23:33
@tikkss
Copy link
Contributor Author

tikkss commented Dec 16, 2023

I've rebased on master and update expected, so the CI has successfully.

I implement independently method only support 平成, 令和 (not 昭和, 平成, 令和). Because the fields capable of accepting dates only contained Heisei and Reiwa. This fact was confirmed with the following script on 24e251d.

require "datasets"

house_of_representative = Datasets::HouseOfRepresentative.new
japanese_date_strings = house_of_representative.collect { |record|
  [
    record.house_of_representatives_of_accepted_bill_on_preliminary_consideration,
    record.house_of_representatives_of_preliminary_refer_on,
    record.house_of_representatives_of_accepted_bill_on,
    record.house_of_representatives_of_refer_on,
    record.house_of_representatives_of_finished_consideration_on,
    record.house_of_representatives_of_finished_deliberation_on,
    record.house_of_councillors_of_accepted_bill_on_preliminary_consideration,
    record.house_of_councillors_of_preliminary_refer_on,
    record.house_of_councillors_of_accepted_bill_on,
    record.house_of_councillors_of_refer_on,
    record.house_of_councillors_of_finished_consideration_on,
    record.house_of_councillors_of_finished_deliberation_on,
    record.promulgated_on,
  ].reject { |s| s.nil? || s == "" }
}.flatten
japanese_date_strings.collect { |s| s[0..1] }
                     .uniq
# => ["平成", "令和"]

The private method parse_japanese_date was asserted correctly with the
following script.

require "date"
require "test-unit"

JAPANESE_DATE_MATCHER = /\A(平成|令和)\s*(\d{1,2}|元)年\s*\d{1,2}月\s*\d{1,2}日\z/.freeze

def parse_japanese_date(field)
  return field unless field.to_s.match?(JAPANESE_DATE_MATCHER)
  Date.parse(normalize_jisx0301_format(field))
end

def normalize_jisx0301_format(japanese_date_string)
  japanese_date_string.gsub(/平成/, "H")
                      .gsub(/令和/, "R")
                      .gsub(/元/, "01")
                      .gsub(/\s/, "0")
                      .gsub(/[年月]/, ".")
                      .gsub(/[日]/, "")
end

class ParseJapaneseDateTest < Test::Unit::TestCase
  data("month and day with leading a space in Heisei", ["H10.01.01", "平成10年 1月 1日"])
  data("month         with leading a space in Heisei", ["H10.01.10", "平成10年 1月10日"])
  data("          day with leading a space in Heisei", ["H10.10.01", "平成10年10月 1日"])
  data("           without leading a space in Heisei", ["H10.10.10", "平成10年10月10日"])
  data("year, month and day with leading a space in Reiwa", ["R02.01.01", "令和 2年 1月 1日"])
  data("year, month         with leading a space in Reiwa", ["R02.01.10", "令和 2年 1月10日"])
  data("year,           day with leading a space in Reiwa", ["R02.10.01", "令和 2年10月 1日"])
  data("year,            without leading a space in Reiwa", ["R02.10.10", "令和 2年10月10日"])
  data("boundary within Heisei", ["H31.04.30", "平成31年 4月30日"])
  data("boundary within Reiwa", ["R01.05.01", "令和元年 5月 1日"])
  def test_parse_japanse_date(data)
    expected_jisx0301, japanese_date_string = data
    assert_equal(expected_jisx0301, parse_japanese_date(japanese_date_string).jisx0301)
  end
end

Test::Unit::AutoRunner.run
# Loaded suite irb
# Started
# Finished in 0.002313 seconds.
# --------------------------------------------------------------------------------------------
# 10 tests, 10 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
# 100% passed
# --------------------------------------------------------------------------------------------
# 4323.39 tests/s, 4323.39 assertions/s
# => true

Could you review it again?

lib/datasets/house-of-representative.rb Outdated Show resolved Hide resolved
lib/datasets/house-of-representative.rb Outdated Show resolved Hide resolved
lib/datasets/house-of-representative.rb Outdated Show resolved Hide resolved
@kou
Copy link
Member

kou commented Dec 17, 2023

How about the Japanese date parse method as a Dataset::JapaneseDateParser like other utility class such as ZipExtractor and Downloader? Then we can put the tests to test/test-japanese-date-parser.rb.

@tikkss
Copy link
Contributor Author

tikkss commented Dec 18, 2023

How about the Japanese date parse method as a Dataset::JapaneseDateParser like other utility class such as ZipExtractor and Downloader? Then we can put the tests to test/test-japanese-date-parser.rb.

I see. It is testable! I'll try to extract Datasets::JapaneseDateParser.

@tikkss
Copy link
Contributor Author

tikkss commented Dec 18, 2023

Thanks for your review and suggest some clear code. I dealt with all of them you pointed so far.

lib/datasets/house-of-representative.rb Outdated Show resolved Hide resolved
lib/datasets/house-of-representative.rb Outdated Show resolved Hide resolved
lib/datasets/house-of-representative.rb Outdated Show resolved Hide resolved
lib/datasets/house-of-representative.rb Outdated Show resolved Hide resolved
lib/datasets/house-of-representative.rb Outdated Show resolved Hide resolved
lib/datasets/japanese-date-parser.rb Outdated Show resolved Hide resolved
tikkss and others added 7 commits December 20, 2023 05:37
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Following a test is fail because return value was changed on b5e68f4.

- Bill of the House of Representatives of Japan
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Creating a new `JapaneseDateParser` object for each parsing may be slow.

Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@tikkss
Copy link
Contributor Author

tikkss commented Dec 20, 2023

Thank you so much. I'm excited to encounter some clear code I had not even thought of.

I dealt with all of them you pointed out so far. Could you review it again?

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we can merge this with some minor improvements. :-)

lib/datasets/house-of-representative.rb Outdated Show resolved Hide resolved
test/japanese-date-parser-test.rb Outdated Show resolved Hide resolved
tikkss and others added 3 commits December 22, 2023 05:43
`end_with?` is simpler than regular expression.

Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@tikkss
Copy link
Contributor Author

tikkss commented Dec 21, 2023

Thanks for your review! I've improved all.

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou merged commit 723bdf2 into red-data-tools:master Dec 22, 2023
12 checks passed
@tikkss tikkss deleted the add-house-of-representatives-of-japan branch December 22, 2023 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for bill of the House of Representatives of Japan
2 participants