A Very Brief Introduction to Measure Theory and the Lebesgue Integral (Part I)

In the next few posts, I shall be discussing recent topics of study that, to me at least, have been very intruiging. In previous posts, I have talked about Hilbert spaces. I have of late been considering the mathematics necessary to formally understand in a pure mathematical sense what a Hilbert space is. This post, like the others on this site, serves as a reference of newly learned topics that are of interest (to me, at least; such a comment is subjective, of course).

The purpose of this post is two-fold: (1.) to provide an update with what I’ve been up to; (2.) introduce some interesting mathematics that have expanded my understanding to the “size” of a set as well as operations such as differentiation and integration.

Here is a quick summary of what I plan to cover in the next few posts (to a brief extent):

  1. Elementary Sets and their Measure: Here I will discuss the concept of length and try extend length in greater dimensions to that of a measure of a set. Much of this topic will rely on geometric intuition.
  2. Lebesgue Measure: This section will dicuss the concept of Lebesgue measure and distinguish it from the elementary measure. Also brief mention will be made of measurability of sets and functions.
  3. General Measure: Discussion will be made of a general measure as a function as well as measurable spaces and measure spaces.
  4. Lebesgue Integral: This topic will introduce the concept of the Lebesgue integral as compared to the Riemann integral.
  5. L^{p} and l^{p} spaces: This section will discuss the concept of a norm as it relates to the spaces L^{p} and l^{p}, and will define each space. We will also introduce the concept of Banach spaces.
  6. Proof that l^{p} space is a Banach space.

Section 1: Elementary Sets and their Measure:

The question that we want to answer is this: Given an arbitrary set, how do we go about measuring it?

In order to understand the difficulties present in this question we must first consider what are called elementary sets and the elementary measure. Elementary sets are those sets which are intuitively easy to measure; that is, intervals, rectangles, and boxes. We now give the formal definition of an elementary set:

Definition. (Interval; Elementary Set) We define an interval to be a subset of the real line \mathbb{R} which take one of the following forms:

[a,b] := \{x\in \mathbb{R}|a\leq x \leq b\} \label{(1.1)};

[a,b) := \{x\in \mathbb{R}|a\leq x < b\} \label{(1.2)};

(a,b] :=  \{x\in \mathbb{R}|a< x\leq b\} \label{(1.3)};

(a,b):= \{x\in \mathbb{R}|a< x< b\} \label{(1.4)},

The length of an interval I=[a,b] denoted l(I):= b-a. For dimensions d\geq 2, we define the measure of such sets as equalling the d-times Cartesian product of intervals I_{d}; that is,

\displaystyle m(B) := \prod_{i=1}^{d}l(I_{i}); \label{(2)}

we sometimes call sets of dimension 2 or greater as “boxes.” Thus, elementary sets are those subsets of \mathbb{R}^{d} such that

\displaystyle m(E)= \bigcup_{i=1}^{d}m(B_{i}), \label{(3)}

where B is i-th d-dimensional box contained in \mathbb{R}^{d}.

What this definition is doing is the following: first it introduces the concept of an interval and establishes the well-understood concept of its length as being the difference between the two endpoints provided one is less than the other. The definition then generalizes the idea of a length to 2 and 3 dimensions and beyond. Note that in 2-dimensions the interval then becomes a rectangle in the plane. Thus, the measure of the length of an interval then becomes the measure of the area of a rectangle. Similarly, for d=3 we replace rectangles with cubes and the area with the volume. For dimensions d>3, we replace cubes with boxes of d-dimension. Therefore, elementary sets are those subsets of d-dimensional real space that are unions of finitely-many boxes.

Section 2: Lebesgue Measure

In the last section, we discussed sets for which we can measure quite easily. Though ideally we would like to be able to measure more general sets; that is, sets that are more general than elementary sets. Therefore, we require a different way of measurement. Thus, we come to need the Lebesgue measure.

In order to introduce the Lebesgue measure we need to first introduce the concept of the outer measure, which we now define

Definition. (Outer Measure) We define the outer measure of a set E\subset \mathbb{R}, denoted m^{*}(E) to be

\displaystyle m^{*}(E) = \inf\bigg\{\sum_{k=1}^{\infty}l(I_{k})|\forall k\in \mathbb{N}, I_{k} \text{ is open such that } E \subset \bigcup_{k=1}^{\infty}I_{k}\bigg\}.

The outer measure of a set in a sense “overestimates” the size of a given set and then takes the smallest such overestimate to within a specified tolerance. Thus, it estimates the size of the given set “from the outside,” and is used in lieu of the elementary measure when we are dealing with sets that we cannot easily measure the set in a geometrically-intuitive way.

We conclude this post with the definition of the Lebesgue measure given in two forms; the first will be in terms of what we have defined so far, and the second will be defined in terms that will be covered in the next post.

Definition. (Lebesgue Measure.) We define the Lebesgue measure of the set E to be a set whose measure m(E)=m^{*}(E); that is, its measure is equal to the outer measure.

The second way of defining this is as follows:

Definition. (Lebesgue Measure V.2) The Lebesgue measure is the measure on the measureable space (\mathbb{R},\mathcal{L}) where \mathcal{L} is the \sigma-algebra of Lebesgue measurable subsets of \mathbb{R} that assigns to each Lebesgue measurable set its outer measure.

The next post will discuss measures in general, as well as measurable sets, measureable spaces, Borel sets, and \sigma-algebras.

Until then, clear skies!

Open covers, Finite Subcovers, and COMPACTNESS

A second topological concept that is introduced in analysis is compactness. It is a concept that is associated with the Bolzano-Weierstrass Theorem which is as follows

THM. (Bolzano-Weierstrass). Let A be any infinite bounded set of \mathbb{R}. Then there is at least one x\in \mathbb{R} such that every open ball centered on x will contain at least one point in A.

The idea of the proof of this statement is to show that the intersection B_{x}(\epsilon)\cap A \neq \emptyset.

Insofar as compactness is concerned, there are a few different ways to introduce the concept. I will present the various definitions and show that they are all equivalent.

Method 1: Open Covers and Finite Subcovers.
In order to define compactness in this way, we need to define a few things; the first of which is an open cover.

Definition. [Open Cover.] Let (X,d) be a metric space with the defined metric d. Let A\subset (X,d). Then an open cover for A is a collection of open sets \{O_{\alpha}|\alpha \in \mathbb{N}\} such that
\displaystyle A \subset \bigcup_{\alpha\in \mathbb{N}}O_{\alpha}.

N.B. The collection of open subsets O_{\alpha} may be of infinite cardinality.

Another definition that we will need is the following:

Definition. [Limit Point/Cluster Point.] Let (X,d) be a metric space and let B\subset (X,d), and let x_{0}\in X. Then x_{0} is a limit point or a cluster point of A if any open ball of center x_{0} contains an infinite number of points from A.

We need one more definition before we define compactness:

Definition. [Finite Subcover.] Given an open cover \{O_{\alpha}\}, a finite subcover is a finite subcollection of open sets from \{O_{\alpha}\} such that
\displaystyle \bigcup_{\alpha = 1}^{n}O_{\alpha}.

Therefore, we can now definite compactness as follows:

Definition. [Compact Set.] Let (X,d) be a metric space with the defined metric d, and let A\subset (X,d). Then we say that A is compact if every open cover for A has a finite subcover.

To make this more concrete, consider the following example:
Example: Let X= \mathbb{R} and let d:\mathbb{R}\times \mathbb{R}\rightarrow \mathbb{R} \triangleq d(p,q)=|p-q|. Then the open interval (0,1) is not a compact set. To see why consider the set of open subsets (1/n,1) for n\in \mathbb{N}. Note that (0,1)\subset \bigcup_{n \in \mathbb{N}}(1/n,1). However,
\displaystyle (0,1) \not\subset \bigcup_{n=1}^{m}(1/n,1). In other words, (or rather in words) what this says is that if we consider all of the open sets of the form (1/n,1) (e.g. (1,1), (1/2,1), (1/3,1)…). We see that for each n =1,2,3,…, the open set increases in size*. Thus, if we consider all the elements that are in at least one of these increasing intervals, then the union (1,1)\cup (1/2,1) \cup (1/3,1) \cup ... \cup (1/n,1) contains the interval (0.1). However, note that if we take only a finite number m, for simplicity say m=3, then we have that the union (1,1) \cup (1/2,1) \cup (1/3,1) does not contain all of the points that are contained in (0,1). Therefore, what this says is that while we can form an open cover for (0,1) we cannot find a finite subcover for that set. Therefore, (0,1) is not a compact set.
*: There is a concept related to the size of an interval which lends itself to a field of study in analysis called measure theory (may post on this topic at a later time).

Method 2: Sequences and Subsequences:
This approach has the benefit that we can just state the definition outright:

Definition. [Compact Set.] Let A\subset \mathbb{R} is compact if every sequence in A has a subsequence that converges to a limit that is also in A.

There is one other type of “definition” used to understand compactness. Some books call this the Characterization of Compactness on the Real Line.
Theorem. [Compact Set.] Let A\subset \mathbb{R}. Then A is compact if and only if A is closed and bounded.

The following theorem states that each of these different ways that are used to define compactness are in fact equivalent:

Theorem. Let A\subset \mathbb{R}. Then each of the following statements are equivalent:
(1.) A is compact;
(2.) A is closed and bounded;
(3.) Every open cover \{O_{\alpha}\} of A has a finite subcover.

The implication of (2.) implies (1.) is what is referred to as the Heine-Borel Theorem. Furthermore, to circle back to the Bolzano-Weierstrass Theorem we can rewrite this statement in terms of compactness:

Theorem. [Bolzano-Weierstrass Theorem.] Let (X,d) be a compact metric space, and let A be an infinite subset of (X,d). Then A has at least one cluster point.

The next post will discuss the proofs of the theorems in this post. Further posts will most likely be on astrophysics and/or cosmology. Until then, clear skies!


If one takes quantum mechanics, when they first encounter the wavefunction which is a complex-valued function, they learn that the arena in which quantum mechanics is a Hilbert space. If one goes further in order to understand what a Hilbert space is they find that it is a complete inner product space. While many physicists take advantage of this fact, they do not really interest themselves with what this means in a rigorous mathematical sense. When I first encountered this, I was unsatisfied with the so-called “definition” of a Hilbert space. So I found that I had to learn more advanced mathematics; more specifically, real analysis. To that end, the purpose of this post is to understand what the term “complete” means. To remedy any confusion of what an inner product space is, an inner product space is a vector space V that equipped with an inner product \langle u, v \rangle.

In order to understand what completeness is, we require a couple of definitions:

Definition. Let \{p_{n}\}_{n=1}^{\infty} where n\in \mathbb{N} be a sequence of points in the metric space (E,d). A point p\in \mathbb{E} is called a limit of the sequence of points if for any \epsilon>0, there exists N\in \mathbb{N} such that if n>N,d(p,p_{n})< \epsilon. If such a limit exists, then we say that the sequence of points \{p_{n}\}_{n=1}^{\infty} converges to the point p\in E.

What this says intuitively, is that in the sequence of points above there exists a term for which n=N which corresponds to the point p_{N} in the metric space E, beyond which any later terms in the sequence will be contained in what we call an open ball which is defined to be the set given by B_{p}(\epsilon)= \{q\in E|d(q,p)<\epsilon\}. We can regard the term p_{N} as a “boundary point”.

Definition. A sequence of points \{p_{n}\}_{n=1}^{\infty} in a metric space (E,d) is said to be a Cauchy sequence, if for any \epsilon>0, there exists N\in \mathbb{N} such that whenever n,m>N, d(p_{n},p_{m})< \epsilon.

The intuitive idea behind this concept is that suppose we take two terms in the sequence of points \{p_{n}\}_{n=1}^{\infty}, we say that it is a Cauchy sequence if whenever these two chosen terms are “beyond the boundary” the distance between these two terms are within \epsilon of each other in the metric space (E,d).

One important result that I am not going to prove is the following:

Theorem. If \{p_{n}\}_{n=1}^{\infty} is a convergent sequence of points in the metric space (E,d), then such a sequence is Cauchy.

An important note: the converse of this theorem is not necessarily true. If the converse is indeed true, we get the following definition:

Definition. A metric space (E,d) is said to be complete if every Cauchy sequence of points in the metric space (E,d) converges to a point p\in E.

An example of this is that \mathbb{R} with the metric d(p,q)=|p-q| is a complete metric space. Intuitively, what this means is that given a Cauchy sequence that converges in \mathbb{R} to a real number. In other words, any possible Cauchy sequence will converge to some real number p.

The next post will discuss compactness in the context of metric spaces, covers, and open covers.

Clear Skies!

Introduction to Metric Spaces

Metric Spaces are one of those mathematical topics that everyone intuitively understands. The best example of this is that of three dimensional Euclidean space E^{3}. This serves as the basis for the intuitive concept of a “space”, and our ability to ascribe a distance between to points in three-dimensional space can be described by a distance function d: E\times E \rightarrow E, or a metric. The underlying set E together with the metric d form what is called a metric space (E,d).

To be more mathematically precise, we make the following definition.

Definition. A metric space is a set E together with a rule which associates with pair p,q\in E a real number d(p,q) such that
\displaystyle d(p,q)\geq 0, \forall p,q\in E
\displaystyle d(p,q) = 0 \iff p=q
\displaystyle d(p,q)=d(q,p) \forall p,q\in E
\displaystyle d(p,r) \leq d(p,q)+ d(q,r)

As an example, suppose that the underlying set E = \mathbb{R} and the metric coupled with this set is defined by d(p,q)= |p-q|. To verify that this indeed a metric space we must show that the four axioms are satisfied.

Claim. The mathematical structure (\mathbb{R},d) in which d:\mathbb{R}\times \mathbb{R}\rightarrow \mathbb{R} is defined by d(p,q)=|p-q| is a metric space.
Proof. Let p,q\in \mathbb{R}. Then by definition of d(p,q) and by definition of the absolute value function, we have that |p-q|\geq 0, so that axiom 1 is satisfied. Suppose now that the points in \mathbb{R} are equal, i.e. that p=q\in \mathbb{R}. Then by definition of d(p,q), we have that |p-q|=|0|=0. Conversely, suppose that |p-q|= 0. By the triangle inequality we have that |p-q|\geq |p|-|q|=0 This implies that |p|= |q|. Thus, condition (2.) is satisfied and hence the distance between two points in \mathbb{R} is zero if and only if the two points are the same. To prove condition (3.), let p,q\in \mathbb{R}, so that d(p,q) = |p-q|. By virtue of the definition of the absolute value, we can say that
\displaystyle d(p,q) = |p-q| = |-(p+q)|= |-1||q-p|= |q-p|= d(q,p).
Thus, we see that the arguments of the proposed distance function is symmetric with respect to its arguments, namely any real numbers p,q\in \mathbb{R}. To prove condition (4.), consider any three points p,q,r\in \mathbb{R}, then the distance function between the points p,r \in \mathbb{R} becomes
\displaystyle d(p,r) = |p-r| = |p-q+q-r|,
wherein we add zero in the form of adding and subtracting the point q (a very common trick in analysis). Then by the properties of the absolute value it follows that
\displaystyle d(p,r) = |p-q+q+r| \leq |p-q|+|q-r| = d(p,q)+d(q,r),
where by the last equality follows from the definition of the distance function. Therefore, all four conditions have been satisfied, and hence by our definition of a metric space, it follows that (\mathbb{R},d) whose distance function d:\mathbb{R}\times \mathbb{R}\rightarrow \mathbb{R} is defined by d(p,q) = |p-q| is indeed a metric space. \square

An Introduction…

A bit of background about myself: Since my sophomore year of high school I have been interested in astronomy, physics, and mathematics. I received my Bachelor’s degree in Earth and Planetary Sciences concentrating in astronomy/astrophysics from Western Connecticut State University. My coursework and independent readings ultimately led to minors in mathematics and physics.

My intention for this blog is to serve as a reference in astrophysics and related topics to myself as well as others. I aim to share my own research interests and consider selected problems that have fascinated me. I also hope to communicate recent news in the fields of physics and astronomy and discuss the implications of discoveries made.

DISCLAIMER: I am by no means an expert, and as such the posts that I create are of my opinion and my own logic. I may be wrong sometimes, and I hope that the people who see this (assuming that anyone sees this) will respect that.

That being said… Enjoy!


(ABOVE: An image of the moon taken with a lunar and planetary imaging camera mounted to a Newtonian 130mm reflecting telescope.)

Introduction to Groups


Abstract algebra can be broken into about two or three sections: (1) Groups; (2) Rings and Fields; (3) Vector Spaces. (A fourth topic that could be considered its own is Galois Theory.) The typical way this version of algebra is introduced is to start by covering groups, then to introduce the concepts of rings and fields in the context of group theory. Rings are interesting, but they are not of interest to us right now. In this post, I attempt to define what a group is and to explain what the definition means. I then follow this up with examples of groups using the real numbers, integers, and complex numbers. In the next post, I will attempt to prove that the real numbers and integers are groups under addition and that the complex numbers form a group under complex multiplication. To that end we must introduce the concept of a group.

Definition.  A group G is a non-empty set equipped with a single binary operation \ast that satisfies the following four axioms:

  1.  For all g_{1}, g_{2}\in G, g_{1}\ast g_{2}\in G;
  2. For all g_{1}, g_{2}, g_{3}\in G, g_{1}\ast (g_{2}\ast g_{3}) = (g_{1}\ast g_{2})\ast g_{3};
  3. For all g\in G, there exists an element g^{-1}\in G : g \ast g^{-1}=e_{G}=g^{-1} \ast g;
  4. For all g\in G, there exists an element e_{G}\in G such that g \ast e_{g}=g=e_{g} \ast g

This definition might not seem very helpful, but it’s the reason as to why we are allowed to use the properties of numbers that we used in high school. What this definition says is that we have a collection of objects that we call elements that is endowed with a binary operation. An example of this would be traditional addition or multiplication. The word binary simply means that it requires two elements to produce another element. For instance, consider the following group G=(\mathbb{Z},+) (we’ll discuss what this notation means later on in the post). Let a=2\in \mathbb{Z} and let b=3\in \mathbb{Z}. We can add these two integers to get another integer, call it c=5\in \mathbb{Z}. This is what we mean by a binary operation.

Now that we understand what the operation is, we can get into what it takes for a set to be a group. Statement 1 simply says that given any two elements in the set a,b, if a \ast b is also in the set, then we say that the set is closed under the operation \ast. Statement 2 says that given any three elements in the set, the order in which the operation is performed, as dictated by the parentheses, is immaterial. This statement ensures that the elements of the set are associative. To clarify this a bit more, consider the following sum in (\mathbb{Z},+):

2 + (3 + 4)

The second statement says that I can remove the parentheses from the sum without it changing value. Indeed,  in each case one gets 9, and so that’s what statement 2 ensures: associativity. Statement 3 says that there is an inverse element to each element in the set. For addition, this inverse element is the additive inverse, by which I mean that for every a\in G, a+(-a)=e_{G}. For multiplication, the inverse element is the multiplicative inverse, in which case we have that for every element a\in G, a\cdot a^{-1}=e_{G}. In the definition, I was careful to include both the right and left inverse. The reason for this is not all sets of elements commute, that is it is not always true that ab=ba. As an example, let M(2,\mathbb{R}) be the set of all 2\times 2 matrices whose determinant is non-zero. Matrices are known to non-commutative, and so \textbf{AB}\neq\textbf{BA}. If it is true that every element in the group commutes, then G is referred to as an abelian group.

Finally, statement 4 ensures that there is an identity element. In the definition and explanation of statement 3, I denoted this as e_{G}. For addition, the identity element is 0, since given any element of G, if we add 0 to it we get the element again. Furthermore, for multiplication, the identity is 1 since anything multiplied by 1 is itself. Note that for addition and multiplication we have different forms for the inverse and identity elements. Thus, we can write these sets in one of two ways: in additive notation a+b, and in multplicative notation ab, depending on the operation involved.  If all four of these axioms, as we call them, are satisfied, then the set is a group under the prescribed operation.

The typical examples of groups include the real numbers \mathbb{R}, the integers \mathbb{Z}, and the complex numbers \mathbb{C} to name a few. We denote the structure by stating what set we are considering, followed by a the binary operation, written as (G, \ast).


In the next post, I will look at \mathbb{R}, \mathbb{Z}, \mathbb{C} in detail and I will show how to prove that the reals and integers are groups under addition and that the complex numbers are a group under complex multiplication.






IMAGE CREDIT/OBTAINED FROM:  https://mappingignorance.org/2015/12/17/einstein-and-quantum-solids/

Quite some time ago, I had posted a numerical study of an Einstein solid and I now present the analytical study of an Einstein solid. As this was one problem in one of my problem sets while studying thermodynamics and statistical mechanics, one may find this exact problem in the following text:

Schroeder D.V.,  An Introduction to Thermal Physics, (2000). Addison Wesley Longman. Chapter 3: Interactions and Implications: Problem 3.25. pp. 108. 

What I will be presenting is my solution to this problem and I will be offering my interpretation of the problem statement and the implications of the solution.


We begin with the provided approximation,

\displaystyle \Omega(N,q)\approx \bigg(\frac{q+N}{q}\bigg)^{q}\bigg(\frac{q+N}{N}\bigg)^{N}.       (1)

This expression represents the multiplicity of an Einstein solid with N oscillators and q energy units. Recall that the equation to find the entropy is the following

\displaystyle S=k\ln{(\Omega(N,q))}.  (2)

Thus, upon substitution of the multiplicity (Eq.(1)) into the equation for entropy (Eq.(2)), one arrives at the equation

\displaystyle  S=k\ln{\bigg\{\bigg(\frac{q+N}{q}\bigg)^{q}\bigg(\frac{q+N}{N}\bigg)^{N}\bigg\}}. (3)

Upon making use of the properties of logarithms, we may write the equation equivalently as

\displaystyle S = k\ln{\bigg\{\bigg(\frac{q+N}{q}\bigg)^{q}\bigg\}}+k\ln{\bigg\{(\frac{q+N}{N}\bigg)^{N}\bigg\}}, (4)

and again using the well-known property that \ln{x}^{a}=a\ln{x}, we may further simplify Eq.(4) such that we arrive at the expression for the entropy of an Einstein solid:

\displaystyle S = kq\ln{\bigg\{\bigg(\frac{q+N}{q}\bigg)\bigg\}}+Nk\ln{\bigg\{\bigg(\frac{q+N}{N}\bigg)\bigg\}}. (5)

We may omit the factor of (2\pi q(q+N))^{1/2}N^{-1/2} from Stirling’s approximation owing to the fact that if N and q are very large numbers, then \sqrt{q}<<q and \sqrt{N}<<N. Hence, it follows that \sqrt{q+N}<<(q+N). So we see that the aforementioned factor is of no consequence provided that q and N (i.e. the number of energy units and oscillators) is very large.

The second part of this problem asks to take the expression we derived for the entropy and compute the temperature. Recall that the definition for temperature is given by the equation

\displaystyle \frac{1}{T}=\bigg(\frac{\partial S}{\partial U}\bigg)^{-1}. (6)

From substitution we may write the following

\displaystyle \frac{1}{T}=\bigg\{\frac{\partial}{\partial U}\bigg(k\bigg(\frac{U}{\epsilon}+N\bigg)\ln{\bigg(\frac{U}{\epsilon}+N\bigg)}\bigg)-\frac{\partial}{\partial U}\bigg(\frac{kU}{\epsilon}\ln{\bigg(\frac{U}{\epsilon}\bigg)}\bigg)-\frac{\partial}{\partial U} (N\ln{(N))}\bigg\}^{-1}, (7)

where I have made use of the properties of logarithms and made the substitution q=U/\epsilon. Differentiating and simplifying yields,

\displaystyle \frac{1}{T}=\bigg\{\bigg(\frac{k}{\epsilon}\bigg)\ln{\bigg(\frac{U}{\epsilon}+N\bigg)}-\ln{\bigg(\frac{U}{\epsilon}\bigg)}\bigg\}^{-1}. (8)

Further simplification yields the equation for temperature T of an Einstein solid

\displaystyle T = \frac{\epsilon}{k\ln{\bigg(\frac{U+\epsilon N}{U}}\bigg)}. (9)


Part three asks us to find the equation for the heat capacity from our temperature equation. Recall that the equation for heat capacity is of the form

\displaystyle C = \bigg(\frac{\partial U}{\partial T}\bigg). (10)

However, we need an equation for the internal energy U as a function of temperature T. We actually have the opposite. So we must solve Eq.(9) for the internal energy, and doing so yields

\displaystyle U(T)=\frac{\epsilon N}{\exp{(\epsilon/kT)}-1}. (11)


Substituting Eq.(11) into Eq.(10) gives

\displaystyle C = \bigg(\frac{\partial U}{\partial T}\bigg)=\frac{\partial}{\partial T}\bigg(\frac{\epsilon N}{\exp{(\epsilon/kT)}-1}\bigg). (12)

Using the quotient rule for derivatives gives the equation for the heat capacity

\displaystyle C =\bigg(\frac{\epsilon^{2}N}{kT^{2}}\bigg) \bigg(\frac{\exp{(\epsilon/kT)}}{{(\exp{(\epsilon/kT)}-1)}^{2}}\bigg). (13)


The next part asked to show that in the limit T \rightarrow \infty, the heat capacity C =Nk. Recall that the Taylor series expansion for the exponential function \exp{(x)} is given by

\displaystyle \exp{(x)}=\sum_{j=0}^{\infty}\frac{x^{j}}{j!}. (14)

 For small values of x we may make the approximation \exp{(x)}\approx 1+ x. Then we have the approximate relation

\displaystyle C \approx \frac{\epsilon^{2}N}{kT^{2}}\frac{(1+\epsilon/kT)}{(1+(\epsilon/kT)-1)^{2}}=\frac{\epsilon^{2}N(1+(\epsilon/kT))}{kT^{2}(\epsilon^{2}/k^2 T^2)}= Nk(1+(\epsilon/kT)). (15)

Then considering the aforementioned limit yields

\displaystyle \lim_{T \rightarrow \infty} C = \lim_{T \rightarrow \infty} \bigg(Nk(1+(\epsilon/kT))\bigg)= Nk. (16)

The final part of my solution to this problem (I did not complete the last portion of the problem see the aforementioned reference for the problem) asks us to graph the resultant equation relating the heat capacity and the temperature. Below is a plot of the function in the technical computing software Maple.


Heat Capacity Plot

Fig.1 Heat Capacity vs. Temperature

For low temperatures, the heat capacity initially starts at 0. However, when t=0.097, there appears to be a dramatic increase in the heat capacity in the dimensionless quantity C/Nk. If the heat capacity C is graphed as a function of temperature T, and one uses the \epsilon values for each of the lead and aluminum curves produced in Fig. 1.14 of Schroeder’s An Introduction to Thermal Physics. 

Let me know if I made any mistakes anywhere, and I will do my best to correct them.


Clear skies!



Basics of Tensor Calculus & General Relativity|A Digression into Special Relativity

So far in this series I have given the definitions of vectors, scalars, tensors, and manifolds. As a result, much of this series has been mostly mathematics and not necessarily physics. To that end, the purpose of this post is to develop the salient points of special relativity. Namely, the intention of this post is to cover the following:

  1. Definition of Inertial Reference Frames: Standard Configuration and Einstein’s Postulates.
  2. Development of the Lorentz Transformation Matrix
  3. Discussion of the Newtonian geometry of spacetime
  4. Discussion of the Minkowski geometry of spacetime (i.e. no curvature)
  5. Finally I will show that the quantity \delta s^{2} is invariant with respect to Lorentz transformations. This is a pretty standard problem in most GR textbooks and in fact in some introductory books on SR.

Continue reading Basics of Tensor Calculus & General Relativity|A Digression into Special Relativity

Basics of Tensor Calculus and General Relativity: An Introduction to Manifolds and Coordinates

SOURCE FOR CONTENT: General Relativity: An Introduction for Physicists, Hobson, M.P., Efsttathiou, G., and Lasenby, A.N., 2006. Cambridge University Press. 

D-Dimensional Hypersphere and Gamma Function: Introduction to Thermal Physics, Schroeder D.V. 2000. Addison-Wesley-Longmann. 


The intended purpose of the post is to introduce the concept of manifolds in the context of physics (mathematicians beware!). Furthermore, I will discuss the concepts of Riemannian and pseudo-Riemannian manifolds before moving on towards tensors. This will be the first post in this topic of the series. In order to properly discuss the concepts of general relativity, I will have to break up this part of the series into smaller posts.


Part I. In this part of the series, I will discuss the concept of a tensor, and then discuss the introductory topics of manifolds.

A topic that has long eluded a conceptual understanding on my part is a tensor. In the first post of the series we saw a very technical and quite frustrating definition of a tensor. I have read numerous treatments and watched a number of lectures and videos and based on everything I have encountered, here is my understanding of a tensor as of writing this post…

Tensors are geometric objects that can be viewed in a similar way as one views matrices, whose elements are components of the tensor, and will have an overall value. More specifically, a tensor will take two geometric objects as inputs and will give you a scalar (a real number). Furthermore, under a transformation or in a different reference frame, the scalar that the tensor outputs will remain the same in all frames. The objects that change are, in fact, the components of the tensor. These components must obey specific transformation equations so as to preserve the scalar quantity of produced by the tensor. In more mathematical sense, a tensor is a mapping of a number of vectors (including 1-forms and the like) into the real number ordered field.

(**This is the best definition that I could come up with in order to define in a more satisfactory way a tensor.**)


According to the aforementioned reference, a manifold in its most general sense, is any set that one can describe by specifying parameters continuously. In the context of physics, we deal with differentiable manifolds.

A differentiable manifold is a continuous collection of points where each point is differentiable. This definition isn’t any better than the initial definition of a tensor. So, to elaborate a bit, we shall define the concept of continuity: a manifold is continuous if in the local region of a point n_{1}, there exists points whose difference relative to n_{1} is dn.

From this, we can say that a differentiable manifold is a manifold for which we can ascribe to it a scalar field containing points at which it is possible to take derivatives of all orders.

Some examples include: 3-Dimensional Euclidean Space and Phase Space.

3-Dimensional Space

This differentiable manifold requires 3 coordinates (parameters) to specify a single point in the space. Since it requires three parameters the dimension of this particular manifold is 3. Mathematicians sometimes call this 3-space.

Phase Space

This is a manifold that one encounters more often in physics. I came across this manifold (although I did not refer to it as such) while taking my thermodynamics and statistical mechanics course. I found that this manifold requires 6 parameters in order to specify any point. Typically, these parameters include positions (or one radius vector) and velocities or momenta.

A submanifold or surface that I found applicable to phase space would be the D-dimensional hypersphere whose surface area is given by

\displaystyle A_{d}(r)=\frac{2\pi^{d/2}}{(\frac{d}{2}-1)!}r^{d-1}=\frac{2\pi^{d/2}}{\Gamma(\frac{d}{2})}r^{d-1}, (1)

where \Gamma(\frac{d}{2}) is the gamma function given by

\displaystyle \Gamma(d+1)\equiv \int_{0}^{\infty}x^{d}\exp{(-x)}dx. (2)

To be more precise, the surface area (Eq.1) of this hypersphere is technically the volume of momentum space, but I am including to present a more concrete example of a manifold.

The next post will make the concepts mentioned here a bit more quantitative. This post was really just to introduce a more conceptual understanding.


Clear skies!


Derivation of the Euler-Lagrange Equation for a Function of Several Dependent Variables


SOURCE FOR CONTENT: Classical Dynamics of Particles and Systems. Thornton and Marion. 5th Edition. 


Consider a functional

\displaystyle \phi = \phi(y_{\mu},y_{\mu}^{\prime}; x), (1)

where \mu = 1,2,...,n. By the method used in a previous section of the aforementioned text, we may write

\displaystyle y_{\mu}(\alpha, x) = y_{\mu}(0,x) +\alpha \eta_{\mu}(x). (2)

Additionally, we will find it useful to define

\displaystyle y_{\mu}^{\prime}(\alpha,x) = y_{\mu}^{\prime}(0,x)+\alpha \eta_{\mu}^{\prime}(x). (3)

Further we may also define an integral functional by way of integrating Eq.(1) over the interval x_{1}\leq x \leq x_{2}, and introducing a variational parameter \alpha we have

\displaystyle J(\alpha) = \int_{x_{1}}^{x_{2}} \phi(y_{\mu},y_{\mu};x)dx. (4)

Two necessary conditions that are used to derive the Euler-Lagrange equation include

\displaystyle \frac{\partial J(\alpha)}{\partial \alpha}\bigg\|_{\alpha=0}=0, (5)


\displaystyle \eta_{\mu}(x_{1})=\eta_{\mu}(x_{2})=0. (6)

Let us take the derivative of J(\alpha) with respect to \alpha yielding

\displaystyle \frac{\partial J}{\partial \alpha}\bigg\|_{\alpha=0}=\frac{\partial}{\partial \alpha}\int_{x_{1}}^{x_{2}}\phi(y_{\mu},y_{\mu}^{\prime};x)dx. (7)

Carrying out the derivative operator on the right-hand-side of Eq.(7) we get

\displaystyle \frac{\partial J}{\partial \alpha}=\int_{x_{1}}^{x_{2}}\sum_{\mu}\bigg\{\partial_{y_{\mu}}\phi \partial_{\alpha}y_{\mu}+\partial_{y_{\mu}^{\prime}}\phi \partial_{\alpha}y_{\mu}^{\prime}\bigg\}dx. (8)

From Eqs.(2) and (3) we see that

\displaystyle \partial_{\alpha}y_{\mu}= \eta_{\mu}(x), (9)


\displaystyle \partial_{\alpha}y_{\mu}^{\prime} = \eta_{\mu}^{\prime}(x). (10)

Thus Eq.(8) becomes…

\displaystyle \frac{\partial J}{\partial \alpha}=\int_{x_{1}}^{x_{2}}\sum_{\mu}\bigg\{\partial_{y_{\mu}}\phi \eta_{\mu}(x)+\partial_{y_{\mu}^{\prime}}\phi \eta_{\mu}^{\prime}(x)\bigg\}dx. (11)

Consider the second term under the summation. We may make use of integration by parts to obtain the following

\displaystyle \frac{\partial J}{\partial \alpha}=\int_{x_{1}}^{x_{2}}\sum_{\mu}\bigg\{\partial_{y_{\mu}}\phi +\frac{d}{dx}(\partial_{y_{\mu}^{\prime}}\phi) \bigg\}\eta_{\mu}(x)dx. (12)

By the necessary condition (Eq.(5)), it follows that

\displaystyle 0 = \sum_{\mu}\bigg\{\partial_{y_{\mu}}\phi +\frac{d}{dx}(\partial_{y_{\mu}^{\prime}}\phi) \bigg\}. (13)

Additionally, in Eq.(13) above, we have also made use of the condition that \eta_{\mu}(x_{1})=\eta_{\mu}(x_{2})=0. Since \eta_{\mu}(x) \neq 0 for any x_{1}\leq x \leq x_{2}, then the terms in the brackets must vanish, yielding the Euler-Lagrange Equation for several dependent variables.

Update: the next posts will be those discussed on my Facebook page. Namely, I intend to continue with my Research Series and my series in Tensor Calculus and General Relativity with various ancillary posts in my Astrophysics Series.


Clear Skies!

Astrophysics Series: Derivation of the Total Energy of a Binary Orbit

SOURCE FOR CONTENT: An Introduction to Modern Astrophysics, Carroll & Ostlie, Cambridge University Press. Ch.2 Celestial Mechanics

Here is my solution to one of the problems in the aforementioned text. I derive the total energy of a binary system making use of center-of-mass coordinates. In order to conceptualize it I have used the binary Alpha Centauri A and Alpha Centauri B. While writing this I stumbled upon the Kepler problem, the two-body problem, and the N-body problem. Leave a comment if you would like me to consider that in another post.

Clear Skies!

Derivation of the Total Energy of a Binary Orbit:

Setup: Consider the nearest binary star system to our solar system: Alpha Centauri A and Alpha Centauri B. These two stars orbit each other about a common center of mass; a point called a barycenter. The orbital radius vector of Alpha Centauri A is \textbf{r}_{1} and the orbital radius vector of Alpha Centauri B is \textbf{r}_{2}. The masses of Alpha Centauri A and B are m_{1}, and m_{2}, respectively. The total mass of the binary orbit M is the sum of the individual masses of each component. In the context of this system, we encounter what is called the two-body problem of which there exists a special case known as the Kepler Problem (by the way let me know if that would be something that you guys would want to see…). We can simplify this two-body problem by making use of center-of-mass coordinates wherein we define the reduced mass \mu. Therefore, the derivation of the total energy of the binary system of Alpha Centauri A and B will be carried out in such a coordinate system.

To derive this energy equation, one would typically make use of center-of-mass coordinates in which

\displaystyle \textbf{r}_{1}=-\frac{\mu}{m_{1}}r,  (0.1)


\displaystyle \textbf{r}_{2}=\frac{\mu}{m_{2}}r, (0.2)

where \mu represents the reduced mass given by

\displaystyle \mu\equiv \frac{m_{1}m_{2}}{m_{1}+m_{2}}=\frac{m_{1}m_{2}}{M}. (0.3)

Recall from conservation of energy that

\displaystyle E = \frac{1}{2}m_{1}\dot{r}_{1}^{2}+\frac{1}{2}m_{2}\dot{r}_{2}^{2}-G\frac{m_{1}m_{2}}{|\mathcal{R}|}, (1)

where |\mathcal{R}| represents the separation distance between the two components. Let us take the derivative of Eqs.(0.1) and (0.2) to get

\displaystyle \dot{r}_{1}=-\frac{\mu}{m_{1}}v, (2.1)


\displaystyle \dot{r}_{2}= \frac{\mu}{m_{2}}v. (2.2)

Substitution yields

\displaystyle E = \frac{1}{2}\frac{\mu^{2}}{m_{1}}v^{2}+\frac{1}{2}\frac{\mu^{2}}{m_{2}}v^{2}-G\frac{m_{1}m_{2}}{|\mathcal{R}|}. (3)

Upon making use of the definition of the reduced mass (Eq. (0.3)) we arrive at

\displaystyle E = \frac{1}{2}\mu v^{2}-G\frac{M \mu}{|\mathcal{R}|}. (4)

If we solve for m_{1}m_{2} in Eq.(0.3) we get the total energy of the binary Alpha Centauri A and B. This is true for any binary system assuming center-of-mass coordinates.


%d bloggers like this: