Skip to content

Commit

Permalink
🐛 Fix attachment regex to allow lowercase characters and underscores
Browse files Browse the repository at this point in the history
Also, since we're now being less strict with the regex, I'll add the restriction that the attachment should be the first thing in the message to help avoid false positives. HOPEFULLY this assumption won't bite me in the ass later

As always we also have to check for the LTR / RTL characters because whatsapp really loves them

Closes #260
  • Loading branch information
Pustur committed Sep 20, 2024
1 parent 1ddd9d5 commit 9d68c61
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 1 deletion.
3 changes: 2 additions & 1 deletion src/parser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ const regexParserSystem = new RegExp(
sharedRegex.source + messageRegex.source,
'i',
);
const regexAttachment = /<.+:(.+)>|([A-Z\d-]+\.\w+)\s[(<].+[)>]/;
const regexAttachment =
/^(?:\u200E|\u200F)*(?:<.+:(.+)>|([\w-]+\.\w+)\s[(<].+[)>])/;

/**
* Takes an array of lines and detects the lines that are part of a previous
Expand Down
9 changes: 9 additions & 0 deletions tests/parser.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -207,11 +207,14 @@ describe('parser.js', () => {
'3/6/18, 1:55 p.m. - a: IMG-20210428-WA0001.jpg (file attached)';
const format3 =
'3/6/18, 1:55 p.m. - a: 2015-08-04-PHOTO-00004762.jpg <‎attached>';
const format4 =
'3/6/18, 1:55 p.m. - a: ‎4f2680f1db95a8454775cc2eefc95bfc.jpg (Datei angehängt)\nDir auch frohe Ostern.';
const messages = [
{ system: false, msg: format1 },
{ system: false, msg: '3/6/18, 1:55 p.m. - a: m' },
{ system: false, msg: format2 },
{ system: false, msg: format3 },
{ system: false, msg: format4 },
];
const parsedWithoutAttachments = parseMessages(messages, {
parseAttachments: false,
Expand Down Expand Up @@ -245,6 +248,12 @@ describe('parser.js', () => {
'2015-08-04-PHOTO-00004762.jpg',
);
});

it('should correctly parse the attachment string with format #4', () => {
expect(parsedWithAttachments[4]?.attachment?.fileName).toBe(
'4f2680f1db95a8454775cc2eefc95bfc.jpg',
);
});
});
});
});
Expand Down

0 comments on commit 9d68c61

Please sign in to comment.