In classical, non-relativistic particle mechanics the action of a system is given by

$$ S[\vec{q}(t)] = \int_{t_1}^{t_2} L(\vec{q}, \dot{\vec{q}}, t) ~ dt $$

which is a functional $S: F^n \rightarrow \mathbb{R}$, i.e. it maps the $n$ functions $\vec{q}(t) = (q_1(t),\dots,q_n(t)) \in F^n$ of some function space $F$ onto the real numbers.

The principle of least action states that the physically occupied path $\vec{q}(t)$ is the one for which $S[\vec{q}(t)]$ is minimal. In fact, as we will see later, whether it is minimal or maximal does not matter and is just a convention. So in a sense, it is better to state that $S[\vec{q}(t)]$ is extremal.

If you only know ordinary calculus with functions, there is a problem in converting this statement to one you can work with. Consider an ordinary function

\begin{equation}

\begin{split}

& f: \mathbb{R}^2 \rightarrow \mathbb{R} \\

& (x,y) \mapsto f(x,y)

\end{split}

\end{equation}

When this function is extremal, it certainly holds that

\begin{equation}

\begin{split}

& \frac{\partial f}{\partial x} (x_0, y_0) = 0 \\

& \frac{\partial f}{\partial y} (x_0, y_0) = 0

\end{split}

\end{equation}

where $(x_0,y_0)$ is the point at which $f$ has its extremum. These two identities can in turn be used to calculate $(x_0,y_0)$ explicitly. Unfortunately, such a simple expression is not available for functionals. Here is why:

A functional is a map over some function space $F$ to the real numbers $\mathbb{R}$. The function space $F$ is (just like $\mathbb{R}^2$ in our example) a vector space. However, the dimension is infinite! Hence there is no calculation rule, which can be solved in finitely many steps. The exact instruction “differentiate the functional along every direction of $F$ once and set the derivative to zero” cannot be carried out in a finite amount of steps, since there is an infinte number of directions.

So what’s the solution? Well, we can transform the statement about extreme values of functions into a more complicated but equivalent form, whose analogy can be used to define a derivative for functionals. The equivalent statement is as follows: If $f$ is extremal at $(x_0,y_0)$ then the directional derivative

\begin{equation}

\begin{split}

\lim_{\varepsilon \rightarrow 0} \frac{f(x_0+\varepsilon n_1, y_0+\varepsilon n_2) – f(x_0,y_0)}{\varepsilon} = \nabla f (x_0,y_0) \cdot \vec{n} = 0

\end{split}

\end{equation}

is zero for all directions $\vec{n}=(n_1,n_2) \in \mathbb{R}^2$. A different way to write this is

\begin{equation}

\begin{split}

\left[ \frac{d}{d \varepsilon} f(\vec{r} + \varepsilon \vec{n}) \right]_{\varepsilon = 0} = 0

\end{split}

\end{equation}

at $\vec{r} = (x_0, y_0)$ for all directions $\vec{n} \in \mathbb{R}^2$. This expression is now suitable for functionals because we captured the problem about infinity in a statement about the directions $\vec{n}$.

For the action to have an extremum it holds that

\begin{equation}

\begin{split}

\lim_{\varepsilon \rightarrow 0} \frac{S[ \vec{q}(t) + \varepsilon \vec{n}(t) ] – S[ \vec{q}(t) ] }{\varepsilon} = \left[ \frac{d}{d \varepsilon} S[ \vec{q}(t) + \varepsilon \vec{n}(t) ] \right]_{\varepsilon = 0} \overset{!}{=} 0

\end{split}

\end{equation}

for all directions $\vec{n}(t) \in F^n$ in the function space, i.e. the directional derivatives vanish. One can bring this relation to a simpler form as follows:

\begin{equation}

\begin{split}

& \left[ \frac{d}{d \varepsilon} S[ \vec{q}(t) + \varepsilon \vec{n}(t) ] \right]_{\varepsilon = 0} = \left[ \frac{d}{d \varepsilon} \int_{t_1}^{t_2} L(\vec{q}(t) + \varepsilon \vec{n}(t), \dot{\vec{q}}(t) + \varepsilon \dot{\vec{n}}(t), t) ~ dt \right]_{\varepsilon = 0} \\

& = \int_{t_1}^{t_2} \sum_{i=1}^n \frac{\partial L}{\partial q_i} \cdot n_i(t) + \frac{\partial L}{\partial \dot{q}_i} \cdot \dot{n}_i(t) ~ dt \\

& = \sum_{i=1}^n \left[ \frac{\partial L}{\partial \dot{q}_i} \cdot n_i(t) \right]_{t_1}^{t_2} ~ + ~ \int_{t_1}^{t_2} \sum_{i=1}^n \frac{\partial L}{\partial q_i} \cdot n_i(t) – \frac{d}{dt} \left( \frac{\partial L}{\partial \dot{q}_i} \right) \cdot n_i(t) ~ dt

\end{split}

\end{equation}

where in the last step we have integrated by parts. What we have done up to this point is the usual functional derivative. Let’s put some physics on top:

By calculating this directional derivative we use $\frac{S[ \vec{q}(t) + \varepsilon \vec{n}(t) ] – S[ \vec{q}(t) ] }{\varepsilon}$. Hence, we essentially compare the value of $S$ at two different “points”. The following figure illustrates that:

The physical system is going to evolve along $\vec{q}(t)$ to go from point $A$ at time $t_1$ to point $B$ at time $t_2$. The displaced path $\vec{q}(t) + \varepsilon \vec{n}(t)$ will not have a minimal action $S$, however it will start and end at $A$ and $B$ as well. We don’t want to compare paths that can start and end anywhere they want! The variation $\vec{n}(t)$ is therefore fixed at the boundary:

$$ \vec{n}(t_1) = \vec{n}(t_2) = 0 $$

This constrained form of the functional derivative is called variational derivative or simply variation such that

\begin{equation}

\begin{split}

& \left[ \frac{d}{d \varepsilon} S[ \vec{q}(t) + \varepsilon \vec{n}(t) ] \right]_{\varepsilon = 0} = \int_{t_1}^{t_2} \sum_{i=1}^n \left( \frac{\partial L}{\partial q_i} – \frac{d}{dt} \left( \frac{\partial L}{\partial \dot{q}_i} \right) \right) \cdot n_i(t) ~ dt \overset{!}{=} 0

\end{split}

\end{equation}

Remember that the displacements $n_i(t)$ are completely arbitrary (up to their value at the boundary), e.g. we could always choose them so that they are positive if $\frac{\partial L}{\partial q_i} – \frac{d}{dt} \left( \frac{\partial L}{\partial \dot{q}_i} \right)$ is positive and negative elsewhere. The integrand will always be $\ge 0$. Hence, for the whole integral to be zero it already has to hold that

$$ \frac{\partial L}{\partial q_i} – \frac{d}{dt} \left( \frac{\partial L}{\partial \dot{q}_i} \right) = 0 $$

for all $i=1 \dots n$ independent of the values of $n_i(t)$. These equations are called the Euler-Lagrange equations and they represent a necessary condition for the action $S$ to be extremal.

Now what about the actual value of $S$? Is it a minimum or maximum? The answer is: it doesn’t matter. For instance suppose $S$ would have a maximum at the solution $\vec{q}(t)$ of the Euler-Lagrange equations. Then certainly $-S$ would have a minimum. The condition for $-S$ to be minimal are the Euler-Lagrange equations with $L$ replaced by $-L$. However, this transformation does not alter the equations at all; they are invariant under that transformation. Demanding that a physical system minimizes or maximizes a certain quantity is completely irrelevant for the behavior of that system – it’s just convention.