By now, you've seen several example applications. All of them would set
up a pipeline and call gst_bin_iterate ()
to start
media processing. You might have started wondering what happens during
pipeline iteration. This whole process of media processing is called
scheduling. Scheduling is considered one of the most complex parts of
GStreamer. Here, we will do no more than give a global overview of
scheduling, most of which will be purely informative. It might help in
understanding the underlying parts of GStreamer.
The scheduler is responsible for managing the plugins at runtime. Its main responsibilities are:
Managing data throughput between pads and elements in a pipeline. This might sometimes imply temporary data storage between elements.
Calling functions in elements that do the actual data processing.
Monitoring state changes and enabling/disabling elements in the chain.
Selecting and distributing the global clock.
The scheduler is a pluggable component; this means that alternative schedulers can be written and plugged into GStreamer. There is usually no need for interaction in the process of choosing the scheduler, though. The default scheduler in GStreamer is called "opt". Some of the concepts discussed here are specific to opt.
To understand some specifics of scheduling, it is important to know
how elements work internally. Largely, there are four types of elements:
_chain ()
-based elements, _loop
()
-based elements, _get ()
-based
elements and decoupled elements. Each of those have a set of features
and limitations that are important for how they are scheduled.
_chain ()
-based elements are elements that
have a _chain ()
-function defined for each of
their sinkpads. Those functions will receive data whenever input
data is available. In those functions, the element can
push data over its source pad(s) to peer
elements. _chain ()
-based elements cannot
pull additional data from their sinkpad(s).
Most elements in GStreamer are _chain
()
-based.
_loop ()
-based elements are elements that have
a _loop ()
-function defined for the whole
element. Inside this function, the element can pull buffers from
its sink pad(s) and push data over its source pad(s) as it sees fit.
Such elements usually require specific control over their input.
Muxers and demuxers are usually _loop ()
-based.
_get ()
-based elements are elements with only
source pads. For each source pad, a _get
()
-function is defined, which is called whenever the peer
element needs additional input data. Most source elements are, in
fact, _get ()
-based. Such an element cannot
actively push data.
Decoupled elements are elements whose source pads are
_get ()
-based and whose sink pads are
_chain ()
-based. The _chain
()
-function cannot push data over its source pad(s),
however. One such element is the "queue" element,
which is a thread boundary element. Since only one side of such
elements are interesting for one particular scheduler, we can
safely handle those elements as if they were either
_get ()
- or _chain
()
-based. Therefore, we will further omit this type
of elements in the discussion.
Obviously, the type of elements that are linked together have implications for how the elements will be scheduled. If a get-based element is linked to a loop-based element and the loop-based element requests data from its sinkpad, we can just call the get-function and be done with it. However, if two loop-based elements are linked to each other, it's a lot more complicated. Similarly, a loop-based element linked to a chain-based element is a lot easier than two loop-based elements linked to each other.
The default GStreamer scheduler, "opt", uses a concept of chains and groups. A group is a series of elements that can that do not require any context switches or intermediate data stores to be executed. In practice, this implies zero or one loop-based elements, one get-based element (at the beginning) and an infinite amount of chain-based elements. If there is a loop-based element, then the scheduler will simply call this elements loop-function to iterate. If there is no loop-based element, then data will be pulled from the get-based element and will be pushed over the chain-based elements.
A chain is a series of groups that depend on each other for data. For example, two linked loop-based elements would end up in different groups, but in the same chain. Whenever the first loop-based element pushes data over its source pad, the data will be temporarily stored inside the scheduler until the loop-function returns. When it's done, the loop-function of the second element will be called to process this data. If it pulls data from its sinkpad while no data is available, the scheduler will "emulate" a get-function and, in this function, iterate the first group until data is available.
The above is roughly how scheduling works in GStreamer. This has some implications for ideal pipeline design. An pipeline would ideally contain at most one loop-based element, so that all data processing is immediate and no data is stored inside the scheduler during group switches. You would think that this decreases overhead significantly. In practice, this is not so bad, however. It's something to keep in the back of your mind, nothing more.