Panel data (pooled cross-section and time-series) require special care. Here are some pointers.
Consider a data set composed of observations on each of n cross-sectional units (countries, states, persons or whatever) in each of T periods. Let each observation comprise the values of k variables of interest. The data set then contains knT values.
The data should be arranged "by observation": each row represents an observation; each column contains the values of a particular variable. The data matrix then has nT rows and k columns. That leaves open the matter of how the rows should be arranged. There are two possibilities.[1]
Rows grouped by unit. Think of the data matrix as composed of n blocks, each having T rows. The first block of T rows contains the observations on cross-sectional unit 1 for each of the periods; the next block contains the observations on unit 2 for all periods; and so on. In effect, the data matrix is a set of time-series data sets, stacked vertically.
Rows grouped by period. Think of the data matrix as composed of T blocks, each having n rows. The first n rows contain the observations for each of the cross-sectional units in period 1; the next block contains the observations for all units in period 2; and so on. The data matrix is a set of cross-sectional data sets, stacked vertically.
You may use whichever arrangement is more convenient. The first is perhaps easier to keep straight. If you use the second then of course you must ensure that the cross-sectional units appear in the same order in each of the period data blocks. Under gretl's "Restructure panel" which allows you to convert from stacked cross-section form to stacked time series.
menu you will find an itemWhen you import panel data into gretl from a spreadsheet or comma separated format, the panel nature of the data will not be recognized automatically (most likely the data will be treated as "undated"). Getting the data recognized correctly is a two-step process: first, establish the periodicity of the data and the starting observation; second, establish the structure of the data (stacked time series or stacked cross-sections).
For panel data the periodicity equals the number of time periods, in the case of stacked time series, or the number of cross-sectional unit, in the case of stacked cross-sections. (In either case it is the number of rows in each block of the data matrix.)
The starting observation should be set in the form 1:1 (for periodicity less than 10) or 1:01 (for periodicity between 10 and 99; add another leading zero if the periodicity is 100 or greater). In this colon-separated pair of numbers, the leading number represents the data-block and the trailing number represents the entry within that block. (Thus for example, with stacked time series the observation label 3:02 denotes the observation for unit 3, period 2.)
The periodicity and starting observation can be set using the script command setobs or the GUI menu item "Sample, Set frequency, startobs…".
Once the periodicity and starting observation are set appropriately, you can impose the correct interpretation of the data structure using the script command panel or the GUI menu item "Sample, interpret as panel". The panel takes an option, either --time-series (for stacked time series) or --cross-section (for stacked cross-sections). If no option is given, stacked time series is assumed. The "interpret an panel" menu item brings up a dialog box where you select stacked time series or stacked cross-sections.
[1] | If you don't intend to make any conceptual or statistical distinction between cross-sectional and temporal variation in the data you can arrange the rows arbitrarily, but this is probably wasteful of information. |