Disclaimer:   We try to keep this up to date, but you should be sure to understand what you are doing!

ALFALFA Survey - Data Reduction Cookbook     Current as of March 11, 2010

Getting Started:
Before you start, read through this whole document. We are not kidding! Processing ALFALFA data to Level I is not rocket science, but it is not monkey work either. We are sure that, with a bit of training and some careful thought, you can contribute to ALFALFA in a meaningful way by helping us with this part of the overall ALFALFA effort. But, it is **really** important to do the job correctly, so (1) ASK IF YOU ARE UNSURE and (2) TAKE NO SHORTCUTS!

You may also wish to work through the practice questions found at the ALFALFA practice page, especially the introduction to ALFALFA drift scan data.

NEW March 2010!   An updated version of FLAGBB is on-line; consult section 5 of this document for changes!

Getting to Level I: Calibration, bandpass subtraction, baselining and inspecting ALFALFA Data

R. Giovanelli, February 28, 2005
B. Kent, August 16, 2005
Numerous others since then.... :)
Martha, Jan 19,2009

0. Compiling procedures in your IDL session

You will need to load a number of IDL procedures to process the data. The locations of these files should be defined in your .idlstartup file. (At Cornell: /home/dorado3/galaxy/idl/idl_alfa; see instructions for setting up IDL at Arecibo.) Load via the call:

IDL> @wasinit2

The ALFALFA specific procedures (at Cornell stored in /home/dorado3/galaxy/idl/idl_alfa) are loaded via the call:

IDL> @alfinit

If the idl_alfa directory is not in your path, you can load using:

IDL> @/home/dorado3/galaxy/idl/idl_alfa/alfinit

When references are made to procedures without further explanation, they refer to programs stored in the idl_alfa directory. Those programs (well, most of them...) have explanatory descriptions at the beginning, regarding input and output parameters, keywords, and how they function. See those for further understanding.

Please review the ALFALFA file naming conventions and Kristine's tips on Level I processing

1. Conversion of FITS files to IDL .sav files

This step is usually done by the designated observer.
Raw FITS files are converted to a WAS Hdr type format based on correlator routines written by Phil Perillat at NAIC. The drifts are grouped into SAVE files of dimensions [npols, nrecs, nboards] - 2 by 600 by 8. The "d" structure contains the WAS header, a copy of the FITS header, position and setup information, and a data array of dimension N channels (currently 4096).

A FITS file consists of a 600 drift scan for 8 boards and two polarizations, as well as a Cal ON scan. Four IDLE SAV files are created from each FITS file.. 1) A drift file nnnnnnnnn.sav. 2) A Cal OF begin file - the first record from the drift to be used as part of a cal off. 3) A Cal OF end file - the last record from the drift to be used as part of a cal off. 4) A Cal ON file.

A list of all FITS files to be converted is created with complete path names to the files. The file list is named in the format filesyyyymmdd.txt. The program filecreator is then invoked in IDL:

IDL> filecreator, 'filesyyyymmdd.txt'

The process is automatic, and should be run on one of the faster machines at AO as soon as a night's observing run is complete.

2. Getting a session calibration solution from cal scans: Calib1

This step is usually done by Martha before you take over (unless you are the designated observer and reducing the data yourself at Arecibo).
In step 1, we have produced a set of idl .sav files, each containing a drift structure. In addition to the standard drifts, labeled plainly nnnnnnnnn.sav, which contain the actual 600-record drift scans, there are files named, e.g.


which come by "triplets", provided there is no breach of continuity in the data taking process. Each of those files contains a scan with a single 1-sec record; the format is the same as for the other drift scans. The scan name with the "CALON" extension corresponds to a scan in which the cal was fired for one second, at the end of each regular 600 sec drift scan. The one with the "CALOFend" extension is the last record of the drift preceding the firing of the cal, and that with the "CALOFbegin" extension is the first record of the drift immediately following the firing of the cal. An average of the spectra of CALOFend and CALOFbegin will constitute the cal OFF record.

Calib1 will generate two arrays of structures, named respectively dcalON and dcalOFF, which will contain the spectra of each of the calibration events. The format of these structures is identical to that of the drift structures. A "calibration event" is a set of cal ON, cal OFF records, in our case a triplet as listed above.

Calib1 is run on the data of a whole data taking session, or on subsets thereof in case some major change took place during that session which may have reset the scaling of the data, altered the cal values, changed the LO chain, etc. Then you'll run separately Calib1 for the data taken before and after the change.

In order to run Calib1, you'll need a listing of the files containing the cal events. In the directory containing the data of a data taking session, there should be a file containing all the names of the idl .sav files with "CAL" embedded in the name: cal051209a.list . This file is created when filecreator is run at the end of an observing session. You need to copy this file into your working directory and edit the file so that it is organized by triplets. For example, in a normal data taking session in which 5 drift scans were taken, you'll have something like:


You'll want to delete the first line, 504507670CALOFbegin.sav, for the Cal OFF record of the first drift is not a part of a triplet (although the program will run even if you leave it on, and ignore it). Similarly, you will want to add an OFF cal after the last line, so that the last Cal ON is in a complete triplet. Do so by repeating the last cal OFF record. The final CALlist file will look like:


You will need to be careful in preparing this file, if there was some breach of continuity in the data taking, e.g. aborted scans, drifts not followed by Cal ON scans, power readjustment, etc. Even if you have to sacrifice some Cal data, always restrict the list to good triplet sets only. It is always best to make sure that the data you are including in the calibration is OK.

You can now run Calib1.

Define the directory in which the idl .sav files are located, e.g.:

IDL> dir='/home/arecibo2/galaxy/idlraw/05.02.14a'
IDL> CALlist='this is the file name of your CALlist'
IDL> calib1,CALlist,dcalON,dcalOFF,dir=dir

Alternatively, have the names of the CAL files specified in CALlist with the full address, directory and subdirectory from "/" downhill. Then you can ignore specifying "dir=dir" in the call to Calib1.

The procedure will silently load all the calibration data and return the structures dcalON and dcalOFF at the IDL prompt level. You can check that it worked:

IDL> help,dcalON,/st
You may want to save them in an IDL sav file for safety:

IDL> save,dcalON,dcalOFF,file='dcals_05.02.14a.sav'

3. Getting a session calibration solution from cal scans: Calib2

This step is usually done by Martha before you take over (unless you are the designated observer and reducing the data yourself at Arecibo).
Once you have dcalON and dcalOFF available, you can obtain runs of crat and Tsys through your observing session. As a reminder, crat is the value in K of 1 instrumental (spectrometer) unit. Calib2 produces a structure named ncalib:

IDL> calib2,dcalON,dcalOFF,ncalib
You will be asked

Any bad boards in system?
Answer "n" if none.

You will be engaged in interactive approval of the calibration process as the program steps through each beam. Most of the time, you will OK each beam by hitting enter. A typical example is shown here. Oddities which require expert opinions include big spikes outside the flagged windows or double spectra.

Next the calratios (Cal/TPcalON-TPcalOFF) and Tsys will be displayed for the drift. Click for examples of a calratio fit and a Tsys plot. Note the amplitude of the variation in calratios. The calibration performed here is to the first order (it will be improved later in the reduction process.) The difference between points should be on the order of 1K, with δT /T on the order of 1/25. If OK, hit 'enter' to fit by default and then enter the order of the fit as makes sense; lower order is better, but if curvature is evident and warranted, then set N = 2 or 3. (Think here about the statistical significance of the fit. Does the data warrant fitting a curve really, or is N=1 just as statistically valid?) The other two options are to exclude data points or to restart the fitting process.

Note: If something unusual takes place during data taking in the given session, it may be necessary to break the session data into segments, each characterized by a different set of dcalON, dcalOFF and ncalib. You will notice that while running Calib2, as the values for each calibration event are separately displayed for each beam/pol configuration. If, for example, a "step" appears in the values of crat, you may want to process separately the calibration data before and after the step; hence two different ncalib structures will be necessary to properly calibrate the session's drifts.

Check that the ncalib file was created by typing:

IDL> help,ncalib,/st
Add a comment to the log file, providing the number of cals in the set and comment that

The structure ncalib is used in the calibration of the drift data. Save the ncalib structure for future use, e.g.:

IDL> save,ncalib,file='ncalib_050214a.sav'
If you want, you can plot some of the values:

IDL> restore,'ncalib_05.02.14a.sav'
IDL> plot,ncalib.rahr0,ncalib.decdeg0,psym=4

4. Bandpass Subtraction and Scaling, Calibration Monitoring and Position Recording

This step is usually done by Martha before you take over (unless you are the designated observer and reducing the data yourself at Arecibo).
The next step in processing involves calling a procedure called bpd, which is a grab-bag of many operations: it reads the raw "d" structure of each drift, computes a bandpass, applies a baseline, extracts continuum data, scales spectra and continuum in K, monitors galactic HI for possible changes in the calibration records positions sampled in the sky, and stores output in a .sav file, besides leaving it for access at the idl command line level.

Read the descriptions of both procedures in the files bpd.pro and bpc.pro, to learn the nitty-gritty of the processing. Procedure bpd calls a number of other procedures, most notably bpc and make_caldrift. It needs as inputs:

The output of bpd consists of:

The procedure bpdgui runs via a GUI interface.

4.1 Running bpdgui

You invoke the program as:

IDL> bpdgui

A graphical user interface will pop up and you will enter several parameters using the various widgets in the gui. Refer to this example.

First, click on the "browse" button in the upper right part of the gui and a widget will appear prompting you to select the IDL sav file containing the appropriate ncalib structure for the data you are about to process. Click on that file name and you'll see values for Tcals, median Crat and median Tsys filling the slots near the center of the gui. The Tsys values should be on the order of 30K. The plot in the lower part of the gui illustrates the mask function used for the measurement of total power for calibration purposes.

Second, click on the "browse" button in the upper left of the gui, and a widget will appear with the list of drift files. Select the list that is appropriate. It will take about 15-20 seconds to return control (the program is loading the first file from disk). A widget will appear in the upper right asking you to "OK" the list of N drift scans to be processed. The number of drift files should be the same as the number of cal files in Section 3.

Third, click on the appropriate button to decide whether the Calsession and Runpos structures are to be appended to or new ones started. You will have to enter the names of the files being created or appended to. Also it is important to make sure that CRAT is set to 'fit' NOT 'median'. Sample names include calsession_060318.sav and pos_060318_+083100s.sav. In the latter, the +083100 is the night's drift declination and the following "s" indicates this is a spring drift. If a night is spread across more than one drift, make multiple pos files with the different declinations indicated.

Fourth, you will have to set the "interpolation regions" in the bandpass creation process. Bandpass interpolation will involve principally (and perhaps only) the galactic HI line. BPDGUI has a default choice for the interpolation region (channels 3480 to 3520), which may suit most drift scans. However, it may be desirable to tighten the region, so that the bandpass will be more precise in the galactic HI spectral domain. Also galactic HI will shift around both frequency and channel number, during an observing session, and what fits a drift scan may not fit drift scans taken one or more hours later. You should examine about 3 spectra (beginning, middle, end). Refer to this example.   For each spectrum, click on the file in the "raw files" window; after it loads click on the button that says "Interp region" and a set of options will appear, as well as a spectrum of the bandpass that you can interactively inspect. You may then select your own set of interpolation channel values, or select the default ones. Hit "zoom" to change the scale. Identify the channels where the galactic HI begins to rise, click "done" and go on to the others. Find the maximum range needed for your spectra. Then click on any one of the entries in the "Interp regions" box. The Interp Regions Edit widget will appear. Enter these new values (usually only 2) and then click "apply to all". The values in the display should update. Record the interpolation values in the log file. Note that this is a delicate part of the process, and you should be careful with it. Here is an unusual example, in which the interpolation window was enlarged to exclude M33!

OK, you're ready to BPD. Click on "process BP" and let it rip.

It should take about 7-10 seconds to precess each pol/beam configuration, i.e. about 2.5 min for a full drift.

Often, when BPD finishes, there are some warning messages like
% Program caused arithmetic error: Floating underflow
% Program caused arithmetic error: Floating overflow
% Program caused arithmetic error: Floating illegal operand
These should not indicate a real problem, so if the pos file is written correctly and BPD seems to terminate normally, it is safe to assume all is well.

After running BPD, check that the pos file has the correct dimension:

IDL> restore,'pos_050305_+093642s.sav'

IDL> help,pos

This should tell you that "pos" is array with dimension equal to the number of drifts that were processed.

It is recommended that you exit IDL at this point and restart it (to clear memory). You may also want to make a copy of the pos file, just in case...

5. Inspection of the data: flagbb and reviewbb

This is where you pick up if Martha has already run the calibration and bandpass subtraction. Notice that when you pick up the files, the directory contains, in addition to the drift data and the pos, ncalib and calsession IDL "sav" files, several anciallary files including a log file (e.g. "log_processing_YYMMDD") and a starter bad box file ("bb"). You should keep records in the log file, adding on to where Martha has left off. Be sure to erase from that file the instructions (leaving only the relevant information). Also you need to understand the steps that you did not have to perform, so that you know what has happened to your data up to now. If you are not already familiar with the process, check out the practice questions , especially the introduction to ALFALFA drift scan data.

NOTE: March 2010 Changes to FLAGBB:

5.1 flagbb

Once the drift data have been bandpass calibrated, you will visually inspect the data - ONE BEAM/POL AT A A TIME - and flag bad parts of each 2-D map. This is also the first time in the data processing flow that you get to take a close look at the data. You will thus use this module to: (i) inspect the data (ii) flag "bad boxes" (iii) enter comments in your reduction logfile. The coordinates of the regions of bad data in the 2-D map ("badboxes") are logged by the program in the "position" structure, previously created by module BPD (In the previous section we referred to it as "runpos"). The badbox information is contained in the pos structure as POS.BADBOX. BPDGUI inserts a default badbox, which you will edit using flagbb.

5.1.1 Calling flagbb

To begin, open the position sav file in IDL:

IDL> restore,'pos_050305_+093642s.sav'

Next restore a reduced data .sav file produced by BPDGUI, e.g.:

IDL> restore,'d080848+102030.509371189.sav'

The preceding command will load the dred structure, as well as a number of ancillary structures and arrays associated with that drift scan.

The inspection program is called flagbb. It is invoked with this general format:

IDL> flagbb,dred,cont_pt,cmask,pos,GAUSAV=gausav,HAN=han,AGC=agc,$
        N1=n1,N2=n2$, RFIMOD=rfimod,MASK=mask

Your standard way of calling will be:

IDL> flagbb,dred,cont_pt,mask,pos,GAUSAV=11,HAN=3,AGC=1,n1=0,n2=6



The following keywords are almost never used:

If the keyword AGC is set, the program will overplot on each image AGC galaxies in the vicinity of the drift. "Vicinity" is decided by the program on the basis of the size of the galaxy (if size is listed in the AGC). To do this, the program reads an ancillary file with all AGC gals within the Arecibo dec range. Information about the galaxies, including any previous HI flux detections, is printed to the screen.

Once called, flagbb will query the user about the intensity scaling (usually hit enter for default) and then ask what BADBOX information to use. The first time you invoke flagbb, choose (d) for default. The next sections describe the format of the BADBOX tag and how to refine the bad boxes.

5.1.2 Description of POS.BADBOX

The POS.BADBOX tag is an array of integers (100,2,8,4). It contains the coordinates of the corners of up to 100 boxes containing data that are deemed "bad", an especially useful tool when the data will be used for gridding and producing clean cubes. The first dimension of .BADBOX is the box number; the second is the pol number, the third is the beam number and the fourth contains 4 numbers which are LLch,LLrec,URch,URrec, i.e. the channel number and record number of the lower left (LL) corner of the box and the channel number and record number of the upper right (UR) corner of the box. Beam number 7 in this structure is a special value used internally that represents a potential box not necessarily active in any beam, but that has known dimensions and is ready to be added with 'a' or 'A'. The box numbers are defined for specific classes of bad data:

5.1.3 Description of flagbb display

When flagbb is invoked, it uses Phil Perillat's imgdisp procedure to display, one strip and pol at a time, position-frequency maps where the x-axis is channel number and the y-axis is record (sample) number along the strip. Click here for an example. In this screen, you will see:

  • right side - the continuum flux profile along the strip
  • upper image - the position-frequency map of the drift, showing active badboxes plotted as red boxes
  • middle plot - the mask used to produce the total power continuum within the bpc program (Section 4). This mask shows the pixels that were excluded in the computation of total power.
  • lower plot - the average spectrum over the whole drift.

    The user actually works only on the position-frequency map. The continuum mask and averaged spectrum are displayed purely as references. The continuum mask follows a conservative approach and finds many more "bad" channels than there really are in the data. It is however useful to have it as a visual sanity check. The averaged spectrum can be used to help judge the significance of a feature and better determine the box edges.

    5.1.4 Refining POS.BADBOX using flagbb

    Default .BADBOXes (0-20) are indicated by red numbers along the top of the map in the flagbb image display. "Active" boxes are plotted as red rectangles and labeled under each rectangle along the bottom of the map. These default .BADBOXes extend over the full duration of a drift (i.e. they are BAD CHANNELS).

    Since RFI features came and go from one day to the next, but persisting generally for the duration of an observing session, your first task is to refine the list of default .BADBOXes so that it will be close to correct for each of the scans in your observing session. In essence, you build a "memory" of .BADBOXES for your session, which should avoid having to reset them for each drift. In addition to the "default" .BADBOXES, the program allows for writing a badbox file, dimensioned [100,2,8,4], containing all the settings used for a given drift (see Section 5.1.2). At the end of processing a drift, the user has the option of saving the new set of .BADBOXes. At the beginning of the inspection of each drift, the user can (a) read in that file in the internal .BADBOX, then modify it at will or (b) read in the default .BADBOX file, specified inside the program itself.

    After each pol/beam map is smoothed and shown, you can

    You can now :  - delete a box for this map only                       (d boxnr)
                   - delete a box for this + all subsequent maps          (D boxnr)
                   - activate a box for this map only                     (a boxnr)
                   - activate a box for this + all subsq maps             (A boxnr)
                   - modify a box for this map only                       (m boxnr)
                   - modify a box for this + all subsequent maps          (M boxnr)
                   - insert a new box for this map only                   (i)
                   - insert a box for this + all subsequent maps          (I)
                   - modify status flag for board or part thereof         (s)
                   - obtain a row Total and identify peaks                (t)
                   - undo all peaks found by "t"                          (u)
                   - get an image of specified size                       (g)
                   - HVC catalog maps                                     (h)
                   - go back to previous map                              (b)
    	       - exit & go to next map                                (e)

    Description of Options:


    1. the program overwrites the POS structure only after the whole drift has been processed, so if there is a crash, you should start from the beginning of the drift.
    2. The program does not save POS after each drift, so it is your responsibility to do so as often as you see fit, and certainly when you are done with an observing session. It is safest to save after each drift. Save POS this way:
      IDL> save,POS,file='pos050322.sav'
      where 050322 is the date of the observing session.
    3. If you goofed in setting parameters of a box, you can always modify them or delete them
    4. If you noticed that a .BADBOX should have been activated or deleted in a given map, use the "b" option to return to that map. This will not affect any changes you made to the current map.
    5. You can also set or reset boxes manually after you exit flagbb. Say you forgot to deactivate box 2 in beam 4, pol 0, while you were processing the drift number 13 in the series of the observing session. After you are done with flagbb
      IDL> pos[13].badbox[2,0,4,*]=0
      will do the trick. or you can start from scratch on drift 13.
    6. To inspect the job of box setting you've done, you can invoke the program reviewbb (Section 5.2). It is invoked with the same parameters as flagbb, but it does not change anything; it just shows you what you've set.
    7. If the user goes through the data of a drift and realizes that a given map needs to be revisited, the program can be reloaded only for the specific beam (n1=beam to redo, n2=beam to redo); however, both pols will have to be redone, rather than only the one that needs redoing (make sure to use the saved pos file in flagbb not to lose the bad boxes of the other beams).

    Simple flagging rules
    0. Understand what you are doing. If this process were trivial, you wouldn't be doing it; we'd get our cats to do it.
    1. Pay close attention! It is IMPERATIVE that you do a good job. A bad job is worse than no job and will earn you snickers (and we don't mean the candy bar!).
    2. Be sure to use the "t" option.
    3. Know how big to make your boxes; use the "m" key when you need to!
    4. Know what is a continuum source; do not flag continuum sources.
    5. Know what is galactic hydrogen; do not flag galactic hydrogen (or HVC's!)
    6. Know when to set the status flag via the "s" key.
    7. Save the pos file regularly; occasionally make a copy of it.

    5.1.5 Logging comments on data quality and detected galaxies

    As you process each drift, you should log comments about the data quality and detected galaxies. The default log file contains example comments for a drift. Be aware of and log the following:

    5.2 reviewbb

    reviewbb is very similar to flagbb, except that it modifies nothing. It is just used to review the modifications made by flagbb. You call it via

    IDL> reviewbb,dred,cont_pt,mask,pos,GAUSAV=11,HAN=3,AGC=1,N1=n1,N2=n2

    6. Producing and examining a smooth version of the data: strip_pv and inspect

    It is best to smooth the data before inspecting further, so you can see deeper into the noise. In order to produce a smooth display of a dred position-velocity (time-frequency) drift file, you will use strip_pv, which produces and displays a smoothed version of dred. You can then use inspect to display and inspect the data.

    Before the smoothing step, you should check that you did not miss any drifts in the session by insuring that box 0 is assigned in each one:

    IDL> print,where(pos.badbox[0,0,0,2] eq 0)
    If any number is returned, then that drift (remember to start from "0" when you count down the list) has not been flagbb'd correctly. If you get "-1", then you are ready to smooth!

    To run strip_pv:

    IDL> strip_pv,dred,cont_tot,msmooth,n1=0,n2=6,showpol=3,gausav=11,han=5,rfimod=0,units=1


    Once you have msmooth, you can display the data and inspect it with inspect, which uses the display tool atv. Call inspect like this:

    IDL> inspect, msmooth,showpol=3,nstrip=2, agc=1

    where nstrip is the beam number and agc can be used to turn on the AGC catalog display. In this example, beam 3 will be displayed with polarization averaged.

    Inspection options:

    7. Final steps: checking that your flagging job is done When you think that you have completed the flagging for an entire session, you should following the instructions at the end of the log template to verify that all is well.

    IDL> print,where(pos.badbox[0,0,0,2] eq 0)
    If "-1" is returned, then you are ready to run the last check. If some other number appears, then you have not flagged the boxes in the drift corresponding to that entry in the pos file (remember that IDL starts counting from 0). Go back and fix it. Otherwise, proceed to check that all of your boxes are valid:

    IDL> check_pos,pos
    If any problems are found, then go back and fix them. If, on the other hand, checkpos returns no errors, you are done! Notify Martha at this point. She will be very happy to hear from you.

    'Features' at Arecibo (Dec '05/Jan '06):

    Original page created by Riccardo Giovanelli and Brian Kent and maintained by the members of the Cornell ExtraGalactic Group including honorary and visiting EGGs. ("Once an EGG, always an EGG!")
    Last modified: Thu Mar 11 13:26:49 EST 2010 by martha