Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add regexes to ignore entries and evaluate special cases #280

Merged
merged 18 commits into from
Feb 23, 2021

Conversation

AlexanderLill
Copy link
Contributor

No description provided.

@codeclimate
Copy link

codeclimate bot commented Feb 15, 2021

Code Climate has analyzed commit 3f1dca4 and detected 2 issues on this pull request.

Here's the issue category breakdown:

Category Count
Duplication 2

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 99.3% (0.3% change).

View more on Code Climate.

Copy link
Owner

@ChrisRBe ChrisRBe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know your thoughts on my comments :)

@@ -108,6 +108,9 @@ def __format_statement(self, statement):
if not category:
return

if category == "Ignored":
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can move this up to the statement in line 108:

if not category or category == "Ignored"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks :)

src/Statement.py Outdated
@@ -46,8 +56,11 @@ def is_fee(self, booking_type):

def get_date(self):
""" get the date of the statement """
date_to_parse = self._statement[self._config.get_booking_date()]
if date_to_parse == "":
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if date_to_parse == "":
if not date_to_parse:

empty strings are usually evaluated as None in python. I don't know if you knew that already :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored so that does not show up anymore. I knew, but sometimes I tend to be overly explicit with these if statements ^^

src/Statement.py Outdated
@@ -27,6 +27,7 @@ def get_category(self):
:return: category of the statement; if unknown return the empty string
"""
booking_type = self._statement[self._config.get_booking_type()]
value = self.get_value()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. this variable is only read twice. could be discussed if necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored so that does not show up anymore, but I agree, it was probably not necessary

src/Statement.py Outdated
Comment on lines 40 to 48
elif self._config.get_special_entry_regex() and self._config.get_special_entry_regex().match(booking_type):
if value > 0:
category = "Zinsen"
elif value < 0:
category = "Gebühren"
else:
logging.debug(self._statement)
elif self._config.get_ignorable_entry_regex() and self._config.get_ignorable_entry_regex().match(booking_type):
category = "Ignored"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the if statements to its own methods (see is_fee). This is mostly to reduce the complexity of the overall function.
This could look something like:

Suggested change
elif self._config.get_special_entry_regex() and self._config.get_special_entry_regex().match(booking_type):
if value > 0:
category = "Zinsen"
elif value < 0:
category = "Gebühren"
else:
logging.debug(self._statement)
elif self._config.get_ignorable_entry_regex() and self._config.get_ignorable_entry_regex().match(booking_type):
category = "Ignored"
elif self.is_special_entry_interest(booking_type):
category = "Zinsen"
elif self.is_special_entry_fee(booking_type):
category = "Gebühren"
elif self.is_ignorable(booking_type):
category = "Ignored"
def is_ignorable(self, booking_type):
return self._config.get_ignorable_entry_regex() and self._config.get_ignorable_entry_regex().match(booking_type)
def is_special_entry_interest(self, booking_type):
return self._config.get_special_entry_regex() and self._config.get_special_entry_regex().match(booking_type) and self.value() >= 0
def is_special_entry_fee(self, booking_type):
return self._config.get_special_entry_regex() and self._config.get_special_entry_regex().match(booking_type) and self.value() < 0

Overall (in retrospect) not too happy with how much this if branching explodes :) I should have come up with a better way to handle these different categories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tried to make it a little more clear in the refactored version, let me know what you think :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also put the code for handling the special mintos case in a separate function, I am not too happy about the whole part, but the format of the csv does not make it easy to parse the category :|

src/Statement.py Outdated
@@ -46,8 +56,11 @@ def is_fee(self, booking_type):

def get_date(self):
""" get the date of the statement """
date_to_parse = self._statement[self._config.get_booking_date()]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required? Are there statements without date? not so nice. hmm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a downloaded csv with a line without date, but in the file I downloaded today all entries have a date.

So we could keep it to be more resilient for missing dates, as otherwise the parser just throws an error and stops.

src/Statement.py Outdated
Comment on lines 59 to 65
date_to_parse = self._statement[self._config.get_booking_date()]
if date_to_parse == "":
return datetime(1970, 1, 1).date()
return datetime.strptime(
self._statement[self._config.get_booking_date()],
date_to_parse,
self._config.get_booking_date_format(),
).date()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. would this be cleaner?

        if self._statement[self._config.get_booking_date()]:
            statement_date = datetime.strptime(
                self._statement[self._config.get_booking_date()], self._config.get_booking_date_format()
            ).date()
        else:
            statement_date = datetime(1970, 1, 1).date()
        return statement_date

Could you also add a sample entry to the mintos test data to verify that an empty entry is correctly processed? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that looks a lot better, thanks!

@AlexanderLill
Copy link
Contributor Author

AlexanderLill commented Feb 17, 2021

@ChrisRBe thanks for the very useful feedback :) I might be able to work on it on the weekend and also add some tests. I saw the no-date entry the first time, but maybe it was some kind of incomplete transaction on mintos' side, it was the last (and most recent) row in my csv export.

@AlexanderLill
Copy link
Contributor Author

From my side I would now wait for further feedback. Unfortunately the current version does not pass the codeclimate requirement for complexity 5 (it has complexity 6).

src/Statement.py Outdated Show resolved Hide resolved
src/Statement.py Outdated Show resolved Hide resolved
src/Statement.py Outdated
Comment on lines 56 to 57
# This is currently a special case for the Mintos "discount/premium" secondary market transactions parsing,
# where an entry might be a fee or an income depending on its sign.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This is currently a special case for the Mintos "discount/premium" secondary market transactions parsing,
# where an entry might be a fee or an income depending on its sign.
"""
This is currently a special case for the Mintos "discount/premium" secondary market transactions parsing,
where an entry might be a fee or an income depending on its sign.
:param value: how much money was returned/paid
:return: Zinsen if value >= 0; Gebühren if < 0; Ignored in any other case
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks better, thanks :)

src/Statement.py Outdated
Comment on lines 59 to 64
if value >= 0:
return "Zinsen"
elif value < 0:
return "Gebühren"
else:
return "Ignored"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I'm wrong, the else statement is only for the case where value is not a number, right? hm. Should work I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, it might be unexpected, but at least that way the "Undecided" will not make it into the next higher level where it is not handled.

@ChrisRBe
Copy link
Owner

Hey @AlexanderLill, looks pretty good already. only one or two minor comments. Let me have a look at codeclimate, what he is bitching about.

@ChrisRBe
Copy link
Owner

Hey @AlexanderLill, looks pretty good already. only one or two minor comments. Let me have a look at codeclimate, what he is bitching about.

I had another look and added a PR on your fork repository. Let me know what you think.

@AlexanderLill
Copy link
Contributor Author

@ChrisRBe thanks for the feedback, I merged your branch into mine and implemented your feedback. For me this PR on github is getting a bit confusing (especially with the comments on code parts that have changed in the meantime) but I hope I did not forget anything :)

@ChrisRBe
Copy link
Owner

Okay. There's only 2 issues remaining on coding content regarding duplicate code in the writer. I accepted those already and really should have a look at later.

@ChrisRBe
Copy link
Owner

@ChrisRBe thanks for the feedback, I merged your branch into mine and implemented your feedback. For me this PR on github is getting a bit confusing (especially with the comments on code parts that have changed in the meantime) but I hope I did not forget anything :)

I will have another look, but it looks really good already :)

src/Statement.py Outdated Show resolved Hide resolved
@ChrisRBe
Copy link
Owner

@AlexanderLill looks good overall now :) Let me know when you want to merge the PR. 👍

@AlexanderLill AlexanderLill changed the title WIP: Add regexes to ignore entries and evaluate special cases Add regexes to ignore entries and evaluate special cases Feb 23, 2021
@AlexanderLill
Copy link
Contributor Author

@ChrisRBe great, from my side it is ready to merge, feel free to do it whenever you can :)

@ChrisRBe ChrisRBe merged commit 493fd68 into ChrisRBe:master Feb 23, 2021
@AlexanderLill AlexanderLill deleted the mintos_extended_parsing branch December 11, 2021 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants