-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publication Date Parsing #14
base: master
Are you sure you want to change the base?
Changes from all commits
c6d151c
65e0c13
9be2db4
b2b89ea
40df99a
6e07216
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,8 @@ | |
from django.utils.translation import gettext_lazy as _ | ||
|
||
from swp.models import Publication | ||
from swp.utils.date import parse_date | ||
|
||
if TYPE_CHECKING: | ||
from swp.models import Thinktank | ||
|
||
|
@@ -64,6 +66,11 @@ def clean_authors(self) -> Sequence[str]: | |
|
||
return [clean(author) for author in items] | ||
|
||
def clean_publication_dt(self) -> datetime.date: | ||
value = self.truncated_field('publication_date') | ||
default = f'{datetime.date.today().year}-01-01' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When working with dates and datetimes in django consider using the timezone aware utils from |
||
return parse_date(value, default_date_str=default) | ||
|
||
def clean(self) -> Mapping[str, Any]: | ||
super().clean() | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Generated by Django 3.2.16 on 2023-04-13 14:56 | ||
|
||
from django.db import migrations, models | ||
|
||
from swp.utils.date import parse_date | ||
|
||
|
||
def migrate_date(apps, schema_editor): | ||
Publication = apps.get_model('swp', 'Publication') | ||
|
||
for publication in Publication.objects.all(): | ||
default = publication.created.date().isoformat() | ||
publication.publication_dt = parse_date(publication.publication_date, default_date_str=default) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following my initial discussion about losing the information whether a publication date string could be (fully parsed): your field is nullable, maybe keep it null, when no date could be parsed instead of filling it with a misleading date derived from the created date. |
||
publication.save(update_fields=['publication_dt']) | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
('swp', '0039_publication_list'), | ||
] | ||
|
||
operations = [ | ||
migrations.AddField( | ||
model_name='publication', | ||
name='publication_dt', | ||
field=models.DateField(blank=True, null=True, verbose_name='publication dt'), | ||
), | ||
migrations.RunPython(code=migrate_date), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider adding a reverse migration. In this case a |
||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,7 @@ class Publication(models.Model): | |
abstract = models.TextField(_('abstract'), blank=True) # [AB] | ||
authors = ArrayField(models.CharField(max_length=255), blank=True, null=True, verbose_name=_('authors')) # [AU] | ||
publication_date = models.CharField(_('publication date'), max_length=255, blank=True, default='') # [PY] | ||
publication_dt = models.DateField(_('publication dt'), null=True, blank=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This naming scheme is a bit confusing. It would be better to make clear which field contains which data just by the name. I would recommend:
|
||
last_access = models.DateTimeField(_('last access'), default=timezone.now, editable=False) # [Y2] | ||
url = LongURLField(_('URL')) # [UR] | ||
pdf_url = LongURLField(_('PDF URL'), blank=True) # [L1] | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
import re | ||
import datetime as dt | ||
|
||
import datefinder | ||
|
||
|
||
def parse_date(date_str: str, default_date_str: str='1700-01-01') -> dt.date: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why did you decide to pass the default as a string when every current usage of this method has a real date to hand? Why the default I would remove the default all together and return |
||
date_str = ' '.join(date_str.split()) | ||
default_date = dt.datetime.fromisoformat(default_date_str).date() | ||
if not date_str: | ||
return default_date | ||
try: | ||
return dt.datetime.fromisoformat(date_str).date() | ||
except ValueError: | ||
pass | ||
year = default_date.year | ||
years_found = re.findall(r'(19\d{2}|20\d{2})', date_str) | ||
if years_found: | ||
year = int(sorted(years_found)[0]) | ||
if year != default_date.year: | ||
default_date = dt.date(year, 1, 1) | ||
dates_found = sorted([d.date() for d in datefinder.find_dates(date_str)]) | ||
date = dates_found[0] if dates_found else None | ||
if date is None: | ||
return default_date | ||
if years_found and date.year != default_date.year: | ||
return default_date | ||
return date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary to use a truncated value here. You can directly use
self.cleaned_data
.