# Output File Formats PyReduce produces FITS files containing extracted spectra. This page documents the file structure. ## Spectra Format (v2) The current format stores spectra in a FITS binary table with one row per trace. Files are identified by header keyword `E_FMTVER = 2`. ### Header Keywords | Keyword | Description | |---------|-------------| | `E_FMTVER` | Format version (2 for current format) | | `E_STEPS` | Comma-separated list of pipeline steps run | | `E_OSAMPLE` | Extraction oversampling factor | | `E_LAMBDASF` | Slit function smoothing parameter | | `E_LAMBDASP` | Spectrum smoothing parameter | | `E_SWATHW` | Swath width (if set) | | `barycorr` | Barycentric velocity correction (km/s) | ### Table Columns The binary table extension (named `SPECTRA`) contains: | Column | Format | Description | |--------|--------|-------------| | `SPEC` | `{ncol}E` | Extracted spectrum (float32). NaN for masked pixels. | | `SIG` | `{ncol}E` | Uncertainty (float32). NaN for masked pixels. | | `M` | `I` | Spectral order number (see below). -1 if unknown. | | `GROUP` | `16A` | Group identifier ('A', 'B', 'cal', or bundle index). | | `FIBER_IDX` | `I` | Fiber index within group (1-indexed). -1 if unknown. | | `EXTR_H` | `E` | Extraction height used for this trace | | `WAVE` | `{ncol}D` | Wavelength in Angstroms (float64, optional) | | `CONT` | `{ncol}E` | Continuum level (float32, optional) | | `SLITFU` | `{len}E` | Slit function (float32, optional, NaN-padded) | ### Spectral Order Number (`M`) The `M` column contains the physical spectral (diffraction) order number, not a sequential index. In echelle spectrographs, higher order numbers correspond to shorter wavelengths. The order number is assigned during reduction via: 1. **order_centers.yaml**: If the instrument provides this file, traces are matched to known order centers during detection. 2. **Wavelength calibration**: The linelist file contains `obase` (base order number). Each trace gets `m = obase + trace_index`. 3. **Fallback**: For legacy files or MOSAIC mode, `M` may be -1 (unknown) or sequential from 0. The order number is used in 2D wavelength calibration polynomials. See [Wavelength Calibration](wavecal_linelist.md) for details. Each row corresponds to one extracted trace/order. ### Masking Invalid pixels are marked with `NaN` in the `SPEC` and `SIG` columns. This replaces the separate `COLUMNS` array used in the legacy format. ### Reading Spectra ```python from pyreduce.spectra import Spectra # Load spectra (handles both v2 and legacy formats) spectra = Spectra.read("observation.science.fits") # Access individual spectra for s in spectra.data: print(f"Order {s.m}, fiber {s.fiber}") print(f" Wavelength range: {s.wave[~s.mask].min():.1f} - {s.wave[~s.mask].max():.1f} A") # Get stacked arrays arrays = spectra.get_arrays() spec_2d = arrays["spec"] # shape (ntrace, ncol) ``` ## Legacy Echelle Format (v1) Files without `E_FMTVER` or with `E_FMTVER < 2` use the legacy format. ### Structure The binary table has a single row containing flattened 2D arrays: | Column | Format | Description | |--------|--------|-------------| | `SPEC` | `{ntrace*ncol}E` | Flattened spectrum array | | `SIG` | `{ntrace*ncol}E` | Flattened uncertainty array | | `WAVE` | `{ntrace*ncol}D` | Flattened wavelength array | | `CONT` | `{ntrace*ncol}E` | Flattened continuum array | | `COLUMNS` | `{ntrace*2}I` | Column range [start, end] per trace | The `TDIM` keyword stores the original shape as `(ncol, ntrace)`. ### Key Differences from v2 | Aspect | Legacy (v1) | Current (v2) | |--------|-------------|--------------| | Table rows | 1 (flattened) | ntrace (one per spectrum) | | Masking | Separate `COLUMNS` array | NaN in data | | Order info | Not stored | `M` column | | Group info | Not stored | `GROUP` column | | Fiber index | Not stored | `FIBER_IDX` column | | Extraction height | Not stored | `EXTR_H` column | | Slit function | Separate files | `SLITFU` column | ### Reading Legacy Files `Spectra.read()` automatically detects and handles legacy files: ```python from pyreduce.spectra import Spectra # Works for both formats - auto-detects via E_FMTVER header spectra = Spectra.read("old_file.fits") # Access data the same way regardless of original format for s in spectra.data: print(f"Order {s.m}: {len(s.spec)} pixels") ```