-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues when parsing CSV files that extensively use text qualifiers #60
Comments
Hi
Thanks for your report. I will gratefully accept a PR for a unit test (and
a fix as well, if you can but test is good).
As far as I can see, the second example is not a valid CSV due to space
before the double quote: Smith,*<issue>"*""Test....
Does Flatpack work if you remove that space?
Thanks
Benoit
…On Mon, 26 Apr 2021 at 11:18, mvlakh ***@***.***> wrote:
Hi,
I am using Camel to process CSV files and as I understand Camel utilizes
Flatpack to parse CSV content. It looks like there are several defects in
Flatpack that do not allow to parse CSV files properly if they use text
qualifier a lot, looks like there are several edge cases when the library
cannot handle content properly:
- if there multiline string like this one the library handles it
incorrectly:
***@***.***,"This is a long fragment of text
that should be processed as a single field", 1988, 111-222-33,"another field with new line character
that should be considered as a field of the same data row"
It looks like it tries to consume it as separate CSV rows and not as one
row
- if string starts with or contains escaped text qualifier characters
that are part of the string value, the library tries to consume one string
as several separate cells:
Bob, Smith, """Test"" , 2, Some string, still string, also part of the string.",11111111
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#60>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB542OGYCA6MM2XJGMHN7DTKU4WVANCNFSM43SSKJKQ>
.
|
…or one of the found defects.
Hello Benoit, That space was a typo, in my case it was not present so I undated my past comment. Thank you, |
Unit tests that replicate the issue #60
Hi Mykhailo
I'm sorry if I misunderstood but I only looked at the merged code now in
more details and it seems that the test testCsvDocumentWithMultilineString
if failing?
Didn't you provide a fix at the same time with the Pull Request?
Could you kindly double check?
Many thanks
Benoit
…On Mon, 26 Apr 2021 at 15:13, mvlakh ***@***.***> wrote:
Hello Benoit,
That space was a typo, in my case it was not present so I undated my past
comment.
Prepared a pull request with 2 unit tests that replicate the issues and
one fix suggestion
Thank you,
Mykhailo
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#60 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB542JZBMW5WWXWGQC3EUDTKVYIJANCNFSM43SSKJKQ>
.
|
Hello Benoit, Yes it looks like you misunderstood me a bit. My pull requested contained 2 tests ,that reproduce 2 issues, and a fix only for one of these issues. So it is expected that the other test testCsvDocumentWithMultilineString still fails. As a workaround for this issue with multiline string, that does not have a fix as of now, I implemented my own version of the function fetchNextRecord and injected it via inheritance. I do not know if my alternative implementation is completely correct but it solved the issues I had. Attaching the class here for you to take a look. If this code makes sense you can try to use it instead of the existing impl to see how well it works for all the cases. I do not want to create a pull request because this is an experimental change and I do not know if you can consume it as whole, most likely you will need to completely revisit this impl. Thank you, |
Thanks for getting back to me.
I have modified the base code following your suggestion but the
CsvParserTest is still failing on the getRowCount() (L88) it is strange...
does it pass for sure with your code?
I must have done something stupid...
Thanks
Benoit
…On Thu, 27 May 2021 at 08:22, mvlakh ***@***.***> wrote:
Hello Benoit,
Yes it looks like you misunderstood me a bit. My pull requested contained
2 tests ,that reproduce 2 issues, and a fix only for one of these issues.
So it is expected that the other test testCsvDocumentWithMultilineString
still fails.
As a workaround for this issue with multiline string, that does not have a
fix as of now, I implemented my own version of the function fetchNextRecord
and injected it via inheritance. I do not know if my alternative
implementation is completely correct but it solved the issues I had.
Attaching the class here for you to take a look. If this code makes sense
you can try to use it instead of the existing impl to see how well it works
for all the cases.
I do not want to create a pull request because this is an experimental
change and I do not know if you can consume it as whole, most likely you
will need to completely revisit this impl.
Thank you,
Mykhailo
DelimiterParser.txt
<https://github.com/Appendium/flatpack/files/6551895/DelimiterParser.txt>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#60 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAB542LLQ2WHZWG2AB4C5LDTPXXJPANCNFSM43SSKJKQ>
.
|
Hi, Sorry for a delay, had a very busy week. I made a mistake in my test and this is the reason it does not work for you. I have created a pull request that contains fixes for that test and my custom fetch next record function impl that fixes that test. Please take a look at it and see if that can help you to clarify everything. The alternative fetch function implementation seems work fine, at least it solves my issues and I have not issues with it. But I cannot be sure that it covers all the cases. Thank you, |
Fixed unit test that replicates the issue qualifiers #60 and a fix suggestion for it and similar issues
Hi,
I am using Camel to process CSV files and as I understand Camel utilizes Flatpack to parse CSV content. It looks like there are several defects in Flatpack that do not allow to parse CSV files properly if they use text qualifier a lot, looks like there are several edge cases when the library cannot handle content properly:
It looks like it tries to consume it as separate CSV rows and not as one row
The text was updated successfully, but these errors were encountered: