supported tiled gzip_2 compression in fits files

Questions and answers about the best way to get started with Prism including: est hardware to use, software settings, etc..
Post Reply
jason
Posts: 44
Joined: Thu Apr 18, 2019 3:53 am

Wed Sep 18, 2019 12:20 am

Per cfitsio documentation:

https://heasarc.gsfc.nasa.gov/docs/soft ... ssion.html
https://heasarc.gsfc.nasa.gov/docs/soft ... ode41.html

There's also hcompress (haar compression w/ http://www.stsci.edu/software/hcompress.html , which is what is used in the DSS - (I hope to do something better someday...) for people who might tolerate compression losses below certain thresholds with well understood error characteristics (this is why it was allowed to be used in DSS..)

You're going to want to use GZIP_2 tiled for lossless. Tiled for localized compression, tiles like 128x128 or 256x256 to take advantage of localized properties in 2d vs a giant more "random" looking 1d problem. Why GZIP_2? From the above links:
There are 2 variants of the GZIP algorithm: GZIP_1 compresses the array of image pixel value normally with the GZIP algorithm, while GZIP_2 first shuffles the bytes in all the pixel values so that the most-significant byte of every pixel appears first, followed by the less significant bytes in sequence. GZIP_2 may be more effective in cases where the most significant byte in most of the image pixel values contains the same bit pattern. In principle, any number of other compression algorithms could also be supported by the FITS tiled image compression convention.
Shuffling bits across pixels/signals can let's LZ family algorithms be much more effective. This will be more effective and much faster than the recent addition of the bzip2 encoding and should work just as easy for most software using the library to read fits files correctly. Utilities/python can uncompress otherwise.

I've done this comparison before in other domains, you will not easily beat the compression there without going wavelets/lossy / or changing the lz algorithm (which then would go outside of cfitsio), and it's great that it's still very fast - in some cases due to disk read times can be faster than not iff not in memory.


Comparison from 3rd party in 2009, I might be missing something but there was nothing positive sounding about bzip2: https://heasarc.gsfc.nasa.gov/fitsio/fp ... report.pdf

I will provide a few benchmark images and sizes shortly

This thread was spawned by http://www.prism-astro.com/forum_us/vie ... &t=454#top, which was locked - couldn't think of a better way to provide the suggestion than a new thread..somewhere.
Last edited by jason on Thu Sep 19, 2019 5:32 pm, edited 2 times in total.
jason
Posts: 44
Joined: Thu Apr 18, 2019 3:53 am

Thu Sep 19, 2019 1:50 am

astro-image-compression-benchmarks.txt
(1.93 KiB) Downloaded 78 times
Benchmarks post
cfits 3.47 was used for fits compression methods (including fpack)
For jpeg2000, pixinsight was used to encode. *Lossless compression was used*. Tiled images were also benchmarks and it did provide an improvement (result not expected)
webp again was produced from pixinsight (1.8 ripley)
For CPA, prism was of course used - 10.3.60
for fits methods, 128, 256, and 512 were tried, with smaller returns at each level. 256 was chosen as a happy medium arbitrarily.

benchmark inputs (all 16 bit grayscale)
https://www.dropbox.com/sh/0klh078viqo1 ... vQd1a?dl=0
Spreadsheet:
https://www.dropbox.com/scl/fi/sc5bwqw0 ... h3cxgq0rg2

These inputs are not exhaustive, but were data shot by me over months and different times. I'd be happy to add other images of interest.

In all cases tiled gzip_2 was a front runner for lossless compression, boasting factors from 6x to 1.7x and is surprizingly close to wavelet approaches at times.

And yes - put as neutrally as possible - it bested CPA in all cases. Different data can work better with some methods than others so I cannot say we're using each method to the best data - but in terms of a robust, simple, fast, and very effective compression method, gzip_2 tiled with tilesizes of 256x256 should be used. Probably by default.

Unfortunately pixinsight couldn't open up the tiled / compressed files - I will have to show them these results and get them to support it there too, hopefully no big deal.. I'll push for sequence generator pro to do the same.
Post Reply