Glitch in ri A/D system Dig Q1

july 2003

    Diego janches had problems with his meteor data taken 07jul03. His setup was:

How the ri works:

    The ri has 4 12 bit a/d converters (I1,Q1,I2,Q2). Each 12 bit sample is sign extended into a 16 bit number. The data from each digitizer is clocked into a 32K by 16 bit fifo by a delayed sample pulse. The data sits in the fifo until a read request comes from the vme bus. At that time, a  rdFifo pulse is generated and a sample at a time is read out of the fifo. Each 16 bit by 32k fifo is built by combining two 8bit by 32k fifo chips.

Symptoms:

    The problem occured with digitizer Q1.  The data values (a/d counts with 1 count=1.2 mV) ranged from +20 to -20 counts. At some point there was a glitch and it would jump from a negative number to +250 and then -250 (about 250..) and then back to +/-20. This jumping would then continue once it started for the rest of the run. The problem did not show up when the test patterns were run through the system (zeros, toggle, staircase tests were run for many hours without a failure).
 
    A histogram of the meteor data shows that the voltage distribution is skewed. There are about 15e6 samples in the distribution.     To debug the problem, we took the same base band noise and passed it through 4 opamps and then into the 4 a/d converters. The signals should then be pretty close for all 4 a/d converters (the offsets and gains can be a little different). When digitizer Q1 jumps, we can look at the other digitizers to see what it should have done. The plots show the glitch with the same signal input to all 4 digitizers:     The key to the error is that once the error occurs, it continues for hours. This implies that there is a memory somewhere that remembers that the error has occurred.  This is what made us suspicious of the fifo chips.  The table below shows the data samples around the start of a failure. The table below shows the data values (in hex and decimal) where the failure started.
17 16bit samples Q1 digitizer at failure
Col 1

Data
file
sample

Col 2

A/D
sample
Q1Hi,Q1Lo

Col 3
Data (hex)

Q1Hi Q1Lo

Col 4
Data (dec)
Col 5
Data (hex)
Corrected 

Q1Hi  Q1Lo

Col 6 
Data (dec)
Corrected
0 0,0 FF  FD -3 FF  FD -3
1 1,1 00  0E 14 00  0E 14
2 2,2 FF  FE -2 FF  FE -2
3 3,3 00  02 2 00  02 2
4 4,4 FF  F9 -7 FF  F9 -7
5 5.5 00  02 2 00  02 2
6 *** 6,noQ1Lo
      rdpulse
FF  FF -1 FF FA SKIP FF LO8 -6
7 7,6 FF  FA -6 FF FA -6
8 8,7 FF  FA -6 FF F7 -9
9 9,8 FF  F7 -9 FF ED -19
10 10,9 FF  ED -19 FF FB -5
11 11,10 00  FB 251 00 01 1
12 12,11 00  01  1 00 09 9
13 13,12 FF  09 -247 FF F1 -15
14 14,13 00  F1 241 00 04 4
15 15,14 00  04 4 00 00 0
16 16,15 FF  00 -256 FF XX

What is the problem:

    The problem is that the rdFifo pulse for fifo Q1lo was not long enough to reliably clock the fifo output (we could see this when we put a scope on the pin). When the clock pulse was missed, the computer still read the output databus. The "missed" sample would come out on the next fifo read. The high, low 8 bits would then be out of sync.

When did it first occur.

    Diego has data from feb02 that has the jumps. Data from Apr03 does not have the jumps. So it has been around awhile.

Who else might see this jump??

    When the two bytes (hi,lo) get out of sync,  data that goes from negative (0xff nn ) to positive (0x00 mm) will show jumps. If the voltage levels are small then these jumps will be obvious. If the voltage levels are large, then you won't be able to distinguish these jumps (of order 255) from the actual noise.

    Most aeronomony/sband radar programs will take data continuously for 10 seconds to several minutes. Each new cycle of datataking will clear the fifo (resyncing the hi/lo data if an error has occurred). Since diego took data continuously for hours and he had very low voltage levels, his experiment was more likely then most to show the problem.

    When  I tried to generate the error using continuous sampling, it took on average about 200e6 samples before the error occurred.

Resolution:

    On 16jul03 the rdfifo pulse for all the fifos was lengthened (by 7.5 ns??). After this i ran the ri for 6 hours continuously (about 5*e10 samples) and saw no glitches. I've also setup a macro for diego (rdloop) that will take data for a specified number of buffers, stop, and then restart taking ri data . This will clear the fifo and not let any errors continue.

    Diegos data can be corrected. You need to search through the data till you start seeing these jumps. At that point shift the hi8,lo8 bytes to be in sync again. Continue reading data, shifting until you start to see jumps after the shift is applied. This is where another missed fifo pulse occurred. You need to do this for all of the data. You also need to be careful that you don't mistake rfi/meteors for the jumps. That shouldn't be too hard since the jumps occur very often after they start (everytime the voltage crosses 0 volts).
 

processing: x101/030715/doplot.pro, usr/t1748/doit.pro
 home_~phil