-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DICOM writer: initial support for dual personality DICOM/TIFF #3911
Conversation
...so that other writers can use it too.
…fset Useful for cases where all IFDs are written sequentially.
Could more of the background be added here? Is this more a design, peformance, or just time constraint? |
1572bae and 0e65225 are the commits from #3912, cherry-picked onto this branch so that we can have pre-built jars here to help with review. The current build artifact is:
@joshmoore : leaving out the OME-XML is mostly a time constraint at the moment. We may add it later, but want to make sure that the basic TIFF metadata writing is working first. |
👍 |
Setting the "dicom.dual_personality" option to false, e.g. $ bfconvert -option dicom.dual_personality false /path/to/input /path/to/output.dcm will omit the TIFF header from the preamble, as well as the trailing padding tag that would have contained IFDs. By default, dual personality data will still be written.
@dclunie @fedorov : the latest commit here (a386b0a) adds an option to turn off dual personality writing. Setting the
|
…icom-dual-personality
JPEG still needs some work to be compliant with ImageMagick.
…icom-dual-personality
…icom-dual-personality
…icom-dual-personality
Merge conflicts resolved, In addition to what's noted above, I'd expect the following:
This is best tested with brightfield and fluorescence whole slides, but fake data could be used as an input as well. |
I tested:
and it works fine, but adding the "-option dicom.dual_personality false" failed ...
|
As noted in the comment, the IFDs will always be null if dual personality writing is turned off.
Thanks, @sbesson. That's an edge case with files that have more than 2GB of pixel data. Notably, d56bcdd should fix this; I believe it was only a reader issue and not a writer issue. |
Seeing a similar, but slightly different exception when converting the ndpi file DM0014 - 2020-04-02 11.10.47.ndpi With the new option set to false, tile size was 512, attempting to read the individual resolutions with the default DicomReader results in the below exception (the exception below was taken from the lowest resolution level):
Moving each file to a seperate folder and each then is able to be opened individually without the exception. |
Sorry, missed the most recent commit, initial retest looks promising |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retested with the last commit. Using the sample image, generated a series of DICOM WSI files using all supported compression schemes
./artifact/bftools/bfconvert HandEcompressed_Scan1.qptiff default.dcm -tilex 512 -tiley 512 -noflat
./artifact/bftools/bfconvert HandEcompressed_Scan1.qptiff jpeg.dcm -tilex 512 -tiley 512 -noflat -compression JPEG
./artifact/bftools/bfconvert HandEcompressed_Scan1.qptiff j2k.dcm -tilex 512 -tiley 512 -noflat -compression JPEG-2000
./artifact/bftools/bfconvert HandEcompressed_Scan1.qptiff nodual.dcm -tilex 512 -tiley 512 -noflat -option dicom.dual_personality false
Confirmed that for all samples except the one where the dual personality flag is set to false, the generated files are recognized as TIFF with the appropriate compression scheme
(java11) sbesson@Sebastiens-MacBook-Pro-2 Downloads % for i in $(ls *.dcm); do echo $i && tiffinfo $i | grep Compression; done
default_0_0.dcm
Compression Scheme: None
default_0_1.dcm
Compression Scheme: None
default_0_2.dcm
Compression Scheme: None
default_0_3.dcm
Compression Scheme: None
default_0_4.dcm
Compression Scheme: None
default_1_0.dcm
Compression Scheme: None
default_2_0.dcm
default_2_0.dcm: Warning, Nonstandard tile width 2084, convert file.
default_2_0.dcm: Warning, Nonstandard tile length 4024, convert file.
Compression Scheme: None
default_3_0.dcm
default_3_0.dcm: Warning, Nonstandard tile width 521, convert file.
default_3_0.dcm: Warning, Nonstandard tile length 449, convert file.
Compression Scheme: None
j2k_0_0.dcm
Compression Scheme: 33003 (0x80eb)
j2k_0_1.dcm
Compression Scheme: 33003 (0x80eb)
j2k_0_2.dcm
Compression Scheme: 33003 (0x80eb)
j2k_0_3.dcm
Compression Scheme: 33003 (0x80eb)
j2k_0_4.dcm
Compression Scheme: 33003 (0x80eb)
j2k_1_0.dcm
Compression Scheme: 33003 (0x80eb)
j2k_2_0.dcm
j2k_2_0.dcm: Warning, Nonstandard tile width 2084, convert file.
j2k_2_0.dcm: Warning, Nonstandard tile length 4024, convert file.
Compression Scheme: 33003 (0x80eb)
j2k_3_0.dcm
j2k_3_0.dcm: Warning, Nonstandard tile width 521, convert file.
j2k_3_0.dcm: Warning, Nonstandard tile length 449, convert file.
Compression Scheme: 33003 (0x80eb)
jpeg_0_0.dcm
Compression Scheme: JPEG
jpeg_0_1.dcm
Compression Scheme: JPEG
jpeg_0_2.dcm
Compression Scheme: JPEG
jpeg_0_3.dcm
Compression Scheme: JPEG
jpeg_0_4.dcm
Compression Scheme: JPEG
jpeg_1_0.dcm
Compression Scheme: JPEG
jpeg_2_0.dcm
jpeg_2_0.dcm: Warning, Nonstandard tile width 2084, convert file.
jpeg_2_0.dcm: Warning, Nonstandard tile length 4024, convert file.
Compression Scheme: JPEG
jpeg_3_0.dcm
jpeg_3_0.dcm: Warning, Nonstandard tile width 521, convert file.
jpeg_3_0.dcm: Warning, Nonstandard tile length 449, convert file.
Compression Scheme: JPEG
nodual_0_0.dcm
nodual_0_0.dcm: Not a TIFF or MDI file, bad magic number 0 (0x0).
nodual_0_1.dcm
nodual_0_1.dcm: Not a TIFF or MDI file, bad magic number 0 (0x0).
nodual_0_2.dcm
nodual_0_2.dcm: Not a TIFF or MDI file, bad magic number 0 (0x0).
nodual_0_3.dcm
nodual_0_3.dcm: Not a TIFF or MDI file, bad magic number 0 (0x0).
nodual_0_4.dcm
nodual_0_4.dcm: Not a TIFF or MDI file, bad magic number 0 (0x0).
nodual_1_0.dcm
nodual_1_0.dcm: Not a TIFF or MDI file, bad magic number 0 (0x0).
nodual_2_0.dcm
nodual_2_0.dcm: Not a TIFF or MDI file, bad magic number 0 (0x0).
nodual_3_0.dcm
nodual_3_0.dcm: Not a TIFF or MDI file, bad magic number 0 (0x0).
Also compared the reading of a selected region of the largest resolution
./artifact/bftools/showinf -format MinimalTiff jpeg_0_0.dcm -crop 1024,1024,2048,2048
./artifact/bftools/showinf -format MinimalTiff jpeg_0_0.dcm -crop 10240,10240,2048,2048
./artifact/bftools/showinf -format MinimalTiff j2k_0_0.dcm -crop 10240,10240,2048,2048
./artifact/bftools/showinf -format MinimalTiff nodual_0_0.dcm -crop 10240,10240,2048,2048
./artifact/bftools/showinf -format MinimalTiff default_0_0.dcm -crop 10240,10240,2048,2048
./artifact/bftools/showinf default_0_0.dcm -crop 10240,10240,2048,2048
./artifact/bftools/showinf nodual_0_0.dcm -crop 10240,10240,2048,2048 -noflat
as well as the reading of a low resolution
./artifact/bftools/showinf nodual_0_0.dcm -noflat -resolution 4
./artifact/bftools/showinf default_0_0.dcm -noflat -resolution 4
./artifact/bftools/showinf -format MinimalTiff default_0_4.dcm
./artifact/bftools/showinf -format MinimalTiff jpeg_0_4.dcm
./artifact/bftools/showinf -format MinimalTiff j2k_0_4.dcm
All the results are visually consistent with each other. Based on the above, this looks ready for inclusion in the upcoming release of Bio-Formats
Retesting and still seeing some similar issues: Using sample file https://downloads.openmicroscopy.org/images/Hamamatsu-NDPI/openslide/CMU-3/ Same issue with or without the dual personality flag and seeing the same thing with different Java versions.
Also seeing failures in the repo tests this morning: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/lastUnsuccessfulBuild/console |
Excluding and bumping to 7.0.0 as discussed earlier, I'll pick this back up next week. |
Removing exclude label, as I believe the last few commits here should fix the tests. The issue noted in #3911 (comment) is addressed in c891f10; that will fail fast in the case where more than 4GB of pixel data would be written to a single file, with no compression. To my knowledge, there is no valid way to write more than 4GB of uncompressed pixel bytes (see e.g. the |
...since compression isn't set until after setId.
Assuming builds pass overnight, I have no more planned changes here. I have retested conversions listed in:
with 0f7d1c6 and no longer see issues with converting or reading the results. I have also run most of the automated tests locally with 0f7d1c6 and not observed any failures so far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retested using the sample sets that had failed previously:
https://downloads.openmicroscopy.org/images/Vectra-QPTIFF/perkinelmer/PKI_scans/HandEcompressed_Scan1.qptiff
https://downloads.openmicroscopy.org/images/Hamamatsu-NDPI/openslide/CMU-3/
https://downloads.openmicroscopy.org/images/Hamamatsu-NDPI/hamamatsu/DM0014%20-%202020-04-02%2011.10.47.ndpi
Tested using bfconvert, with and without the dual personality flag. When the flag is set to false the resulting DICOM files do not contain any TIFF metadata whne tested with tiffinfo. The files are still able to opened and read as standard DICOM as expected.
When the flag is set or true (or by default without the flaf set), then the resulting DICOM files are able to recognised as TIFF and tiffinfo displays the associated metadata. The files are able to be opened and displayed as expected both by default and when using the -format
flag in bfconvert
to force reading with MinimalTiffReader
.
Retested the same files, this time using compression set to JPEG and JPEG-2000. In both instances the resulting files are able to read and opened, either as DICOM or TIFF.
Overall the PR looks good to merge from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retested using two of the source samples mentioned above with the following conversion options:
Source | Tile size | Compression | Dual-personality |
---|---|---|---|
CMU-3.npdi | 512x512 | None | Yes |
CMU-3.npdi | 2048x2048 | None | Yes |
CMU-3.npdi | 512x512 | JPEG-2000 | Yes |
CMU-3.npdi | 512x512 | JPEG | Yes |
CMU-3.npdi | 512x512 | None | No |
CMU-3.npdi | 512x512 | JPEG | No |
HandEcompressed_Scan1.qptiff | 512x512 | None | Yes |
HandEcompressed_Scan1.qptiff | 2048x2048 | None | Yes |
HandEcompressed_Scan1.qptiff | 512x512 | JPEG-2000 | Yes |
HandEcompressed_Scan1.qptiff | 512x512 | JPEG | Yes |
HandEcompressed_Scan1.qptiff | 512x512 | None | No |
HandEcompressed_Scan1.qptiff | 512x512 | JPEG | No |
% du -csh *
272M CMU-3.ndpi
72K CMU-3_2k
72K CMU-3_512
2.0G CMU-3_j2k
190M CMU-3_jpeg
72K CMU-3_nodual
190M CMU-3_nodualjpeg
416M HandEcompressed_Scan1.qptiff
4.6G HandEcompressed_Scan1_2k
3.2G HandEcompressed_Scan1_512
956M HandEcompressed_Scan1_j2k
177M HandEcompressed_Scan1_jpeg
3.2G HandEcompressed_Scan1_nodual
177M HandEcompressed_Scan1_nodualjpeg
15G total
- all files generated with the dual personality option are valid TIFF as confirmed by
tiffinfo
- all files generated without the dual personality option fail
tiffinfo
with `Not a TIFF or MDI file, bad magic number 11825 (0x2e31) - the conversion of the CMU-3.npi file without compression shows an
Exception in thread "main" loci.formats.FormatException: Cannot write more than 4GB of uncompressed pixel data. Specify a compression type instead.
either with 512x512 or 2048x2048 tiles. The conversion produces 8.7K DICOM files which can are initialized by DicomReader but display corrupted pixel data - all other CMU-3 converted files (with compression) open without issue with DIcomReader and display valid pixel data
- the converted HandEcompressed_Scan1 datasets with 512x512 or 2048x2048, no compression and the dual-personality option both fail initialization with an
IllegalArgumentException
(java11) sbesson@sebastiensmbp2 DICOM % ~/Documents/GitHub/bioformats/tools/showinf -noflat -crop 10240,10240,2048,2048 HandEcompressed_Scan1_512/HandEcompressed_Scan1_512_0_0.dcm
Checking file format [DICOM]
Initializing reader
DicomReader initializing HandEcompressed_Scan1_512/HandEcompressed_Scan1_512_0_0.dcm
Verifying DICOM format
Reading tags
Exception in thread "main" java.lang.IllegalArgumentException: newPosition > limit: (25704 > 25703)
at java.base/java.nio.Buffer.createPositionException(Buffer.java:318)
at java.base/java.nio.Buffer.position(Buffer.java:293)
at java.base/java.nio.ByteBuffer.position(ByteBuffer.java:1094)
at java.base/java.nio.ByteBuffer.position(ByteBuffer.java:262)
at loci.common.NIOFileHandle.buffer(NIOFileHandle.java:650)
at loci.common.NIOFileHandle.seek(NIOFileHandle.java:330)
at loci.common.RandomAccessInputStream.seek(RandomAccessInputStream.java:203)
at loci.formats.in.DicomReader.initFile(DicomReader.java:676)
at loci.formats.FormatReader.setId(FormatReader.java:1466)
at loci.formats.ImageReader.setId(ImageReader.java:863)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:660)
at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:1043)
at loci.formats.tools.ImageInfo.main(ImageInfo.java:1129)
(java11) sbesson@sebastiensmbp2 DICOM % ~/Documents/GitHub/bioformats/tools/showinf -noflat -crop 10240,10240,2048,2048 HandEcompressed_Scan1_2k/HandEcompressed_Scan1_2k_0_0.dcm
Checking file format [DICOM]
Initializing reader
DicomReader initializing HandEcompressed_Scan1_2k/HandEcompressed_Scan1_2k_0_0.dcm
Verifying DICOM format
Reading tags
Exception in thread "main" java.lang.IllegalArgumentException: newPosition > limit: (1944 > 1943)
at java.base/java.nio.Buffer.createPositionException(Buffer.java:318)
at java.base/java.nio.Buffer.position(Buffer.java:293)
at java.base/java.nio.ByteBuffer.position(ByteBuffer.java:1094)
at java.base/java.nio.ByteBuffer.position(ByteBuffer.java:262)
at loci.common.NIOFileHandle.buffer(NIOFileHandle.java:650)
at loci.common.NIOFileHandle.seek(NIOFileHandle.java:330)
at loci.common.RandomAccessInputStream.seek(RandomAccessInputStream.java:203)
at loci.formats.in.DicomReader.initFile(DicomReader.java:676)
at loci.formats.FormatReader.setId(FormatReader.java:1466)
at loci.formats.ImageReader.setId(ImageReader.java:863)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:660)
at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:1043)
at loci.formats.tools.ImageInfo.main(ImageInfo.java:1129)
- both datasets open without issue when selecting
-format MinimalTiff
- all other converted HandEcompressed_Scan1 open as expected
Thanks for the reviews, @sbesson and @dgault. I could not reproduce the 2 errors in #3911 (review) with this PR's branch alone, but I can reproduce them with the most recent merge build's HEAD~ (801bddb). I was very confused as to why that would be the case, but realized there is a 3 byte difference in length between files written with the merge build and files written with the branch. These 3 bytes are accounted for by the difference in the version number; I believe the smaller version string combined with the implicit-length |
@fedorov / @dclunie: mentioned via email as well, but for completeness could you please let us know here or via private email what acknowledgement text you prefer to see in the release notes? #3742 (comment) is what we used in the past, but I'd assume that requires some changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retested the reading of all converted DICOM filesets created in #3911 (review) with the last commit.
Everything now works as expected with the only broken data being the CMU-3 files converted without compression and all other converted datasets initializing and reading binary data identical to the source file. Happy for this to get included in 7.0.0 in the current state.
Also specifically on #3911 (comment), retested the conversion of the NPDI file mentioned in the linked issue and the original image.sc thread. Confirmed that the PhotometricInterpretation is now set to YCbCr when using JPEG compression and RGB otherwise. Does that mean the associated issue should be closed by this PR?
(base) sbesson@sebastiensmbp2 Downloads % tiffinfo ndpi_he.ome.tiff | grep "Photometric Inter"
Photometric Interpretation: RGB color
Photometric Interpretation: RGB color
Photometric Interpretation: RGB color
Photometric Interpretation: RGB color
Photometric Interpretation: min-is-black
Photometric Interpretation: RGB color
Photometric Interpretation: RGB color
Photometric Interpretation: min-is-black
Photometric Interpretation: RGB color
Photometric Interpretation: min-is-black
(base) sbesson@sebastiensmbp2 Downloads % tiffinfo ndpi_he_jpeg.ome.tiff | grep "Photometric Inter"
Photometric Interpretation: YCbCr
Photometric Interpretation: YCbCr
Photometric Interpretation: YCbCr
Photometric Interpretation: YCbCr
Photometric Interpretation: min-is-black
Photometric Interpretation: YCbCr
Photometric Interpretation: YCbCr
Photometric Interpretation: min-is-black
Photometric Interpretation: YCbCr
Photometric Interpretation: min-is-black
(base) sbesson@sebastiensmbp2 Downloads % tiffinfo ndpi_he_j2k.ome.tiff | grep "Photometric Inter"
Photometric Interpretation: RGB color
Photometric Interpretation: RGB color
Photometric Interpretation: RGB color
Photometric Interpretation: RGB color
Photometric Interpretation: min-is-black
Photometric Interpretation: RGB color
Photometric Interpretation: RGB color
Photometric Interpretation: min-is-black
Photometric Interpretation: RGB color
Photometric Interpretation: min-is-black
Agreed, I have updated the description so that #3856 should be closed automatically when this is merged. |
The DICOM writer now includes TIFF metadata as described in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6489422/.
This requires more testing before it is ready for final review, but I would expect basic test cases to produce files that can be read as either DICOM or TIFF. For example, using a public brightfield slide (https://downloads.openmicroscopy.org/images/Vectra-QPTIFF/perkinelmer/PKI_scans/HandEcompressed_Scan1.qptiff):
tiffinfo
and/ortiffdump
should be able to read the outputqptiff-test*.dcm
files.showinf -format MinimalTiff
on any of the output files forces Bio-Formats to read as TIFF, and should display an image of the correct size. For the larger resolutions, it may be necessary to use the-crop x,y,w,h
option to read a smaller region from the full image. As discussed with @fedorov, @dclunie, and @erindiel, only basic TIFF metadata is written. There is no OME-XML written, so these won't be recognized as OME-TIFF (much less pyramid OME-TIFF). As such, when reading as TIFF each file will be detected individually and the pyramid/label/macro/etc. relationships will not be visible.showinf -noflat qptiff-test_0_0.dcm
should continue to detect the whole DICOM dataset, just like DICOM data written without this pull request.dcdump
/dciodvfy
should also be able to read the output files as before.By default, non-BigTIFF files are written, unless the total pixel count approaches 4GB. The
-bigtiff
option toshowinf
will force BigTIFF metadata to be written instead.I have not yet extensively tested different compression options, different tile sizes, or a wider range of input data. I would expect some fine-tuning to be needed here as testing continues.
Fixes #3856 as noted in #3911 (review)