Skip to content
Snippets Groups Projects
Commit 33f7e6a5 authored by schmittu's avatar schmittu :beer:
Browse files

cleaned all notebooks

parent 238856f2
No related branches found
No related tags found
No related merge requests found
Showing
with 15975 additions and 17313 deletions
%% Cell type:code id: tags:
``` python
# IGNORE THIS CELL WHICH CUSTOMIZES LAYOUT AND STYLING OF THE NOTEBOOK !
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings = lambda *a, **kw: None
from IPython.core.display import HTML; HTML(open("custom.html", "r").read())
from IPython.core.display import HTML
HTML(open("custom.html", "r").read())
```
%% Cell type:markdown id: tags:
# Chapter 0: Introduction
<div class="alert alert-block alert-warning">
<i class="fa fa-warning"></i>&nbsp;This script introduces <code>numpy</code>, <code>pandas</code> and <code>matplotlib</code> and <code>seaborn</code> as far as we use it in the following course.
Thus it is not a comprehensive introduction to these libraries !
</div>
%% Cell type:markdown id: tags:
## pandas
`pandas` allows handling tabular data as so called `DataFrame`s. Tabular data means that columns have types. Within a colum values are of the same type, but types can differ between columns.
%% Cell type:markdown id: tags:
### Some basics
%% Cell type:code id: tags:
``` python
# show content of csv file
print(open("data/example.csv").read())
```
%% Cell type:code id: tags:
``` python
# read file with pandas
import pandas as pd
df = pd.read_csv("data/example.csv")
print(df)
```
%% Cell type:markdown id: tags:
<div class="alert alert-block alert-info">
<i class="fa fa-warning"></i>&nbsp;<code>pandas</code> also
supports reading and writing of other file formats, like <code>.xlsx</code>, <code>.hdf5</code> or <code>sqlite3</code> files.
</div>
%% Cell type:code id: tags:
``` python
df.info()
```
%% Cell type:markdown id: tags:
You can see that the colums `a`, `b` and `c` have different types `int64`, `float64` and `object`. The latter can be read as "anything but a number".
%% Cell type:code id: tags:
``` python
# number of rows and columns
print(df.shape)
```
%% Cell type:markdown id: tags:
The `.shape` is numbers of rows times number of columns.
%% Cell type:markdown id: tags:
To show the first 5 rows of a data frame we can use `.head()`.
%% Cell type:code id: tags:
``` python
print(df.head())
```
%% Cell type:markdown id: tags:
And `.tail()` shows the last 5 rows:
%% Cell type:code id: tags:
``` python
print(df.tail())
```
%% Cell type:markdown id: tags:
Both accept an integer to change the number of rows to show:
%% Cell type:code id: tags:
``` python
print(df.head(3))
```
%% Cell type:markdown id: tags:
Compute some statistics on the columns
%% Cell type:code id: tags:
``` python
print(df.describe())
```
%% Cell type:markdown id: tags:
### Accessing parts of a data frame
%% Cell type:markdown id: tags:
We can access separate columns using a column name:
%% Cell type:code id: tags:
``` python
print(df["a"])
```
%% Cell type:markdown id: tags:
Single columns are `Series` in `pandas`:
%% Cell type:code id: tags:
``` python
print(type(df['a']))
print(type(df["a"]))
```
%% Cell type:code id: tags:
``` python
scores = df["a"] + 2 * df["b"]
print(scores)
```
%% Cell type:markdown id: tags:
<div class="alert alert-block alert-warning">
<i class="fa fa-warning"></i>&nbsp;Don't forget that
<ul>
<li> Indexing in Python starts with <code>0</code>
</li>
<li> Upper limits are exclusive
</li>
<li> Negative indices start from the right end, <code>-1</code> is the last element, <code>-2</code> the one before, etc.</li>
<li> <code>:</code> refers to all elements.</li>
</ul>
</div>
%% Cell type:markdown id: tags:
`df.iloc[row_slice, col_slice]` offers index based access:
%% Cell type:code id: tags:
``` python
print(df.iloc[:, 0])
```
%% Cell type:markdown id: tags:
To extract rows `1` to `2` (included), and all columns up to the last one:
%% Cell type:code id: tags:
``` python
print(df.iloc[1:3, :-1])
```
%% Cell type:markdown id: tags:
To extract the last column:
%% Cell type:code id: tags:
``` python
print(df.iloc[1:3, -1])
```
%% Cell type:markdown id: tags:
### Filtering a data frame
%% Cell type:code id: tags:
``` python
# all rows where the value of a is smaller than 10:
print(df[df["a"] < 10])
```
%% Cell type:markdown id: tags:
This works as follows:
%% Cell type:code id: tags:
``` python
flags = df["a"] > 3
# we see that flags is a vector with logical values depending on
# the given condition "a > 3":
print(flags)
```
%% Cell type:code id: tags:
``` python
# when we pass these logical values to "df[...]" only the "True rows"
# remain:
print(df[flags])
```
%% Cell type:markdown id: tags:
Another example:
%% Cell type:code id: tags:
``` python
print(df[df["c"] == "one"])
```
%% Cell type:markdown id: tags:
### Extending a dataframe
Adding a new, computed column:
%% Cell type:code id: tags:
``` python
# values in new column d will be values from "a" squared:
df["d"] = df["a"] ** 2
print(df.head())
```
%% Cell type:markdown id: tags:
We can also overwrite a column, here we use `apply` to apply the same function on all values in the given column:
%% Cell type:code id: tags:
``` python
def increment(v):
return v + 1
df["d"] = df["d"].apply(increment)
print(df.head())
```
%% Cell type:markdown id: tags:
## numpy
`numpy` offers data structures from linear algebra, e.g. vectors and matrices.
In contrast to `pd.DataFrame` matrices contain numbers of the same type.
%% Cell type:code id: tags:
``` python
import numpy as np
x = np.array([3.0, 5.0, 8.0])
print(x)
```
%% Cell type:code id: tags:
``` python
print(x.shape)
```
%% Cell type:code id: tags:
``` python
A = np.array([[1.0, 2.0, 3.0],
[3.0, 4.0, 5.0],
[3.0, 5.0, 3.0],
])
A = np.array(
[
[1.0, 2.0, 3.0],
[3.0, 4.0, 5.0],
[3.0, 5.0, 3.0],
]
)
print(A)
```
%% Cell type:code id: tags:
``` python
print(A.shape)
```
%% Cell type:markdown id: tags:
Indexed access works as usual:
%% Cell type:code id: tags:
``` python
print(x[0])
print(x[-1])
print(x[1:])
```
%% Cell type:code id: tags:
``` python
print(A[1, 0])
print(A[:, 1])
```
%% Cell type:markdown id: tags:
Numpy offers element-wise function application:
%% Cell type:code id: tags:
``` python
# caveat ! not matrix-matrix multiplication
print(A * A)
```
%% Cell type:code id: tags:
``` python
# this is matrix-matrix multiplication:
print(A @ A)
```
%% Cell type:code id: tags:
``` python
# substract 3 from all elements:
print(A - 3)
```
%% Cell type:code id: tags:
``` python
# subtract 3 from all elements, then compute absolute
# values for every element:
print(np.abs(A - 3))
```
%% Cell type:code id: tags:
``` python
x = np.linspace(0, 8, 11)
print(x)
print(len(x))
```
%% Cell type:code id: tags:
``` python
# we can also filter values:
print(x[x < 2])
```
%% Cell type:markdown id: tags:
In computations like addition `True` is handled as `1` and `False` as `0`.
%% Cell type:code id: tags:
``` python
p = np.sum(x < 2)
print(p)
print(p / len(x) * 100, "percent of entries in x are smaller than 2")
```
%% Cell type:markdown id: tags:
## About plotting
We use `matplotlib` and also `seaborn` in the script. `seaboarn` is a layer ontop of `matplotlib` offering some easy-to-use standard plots and also a more modern layout and styling.
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
x = np.linspace(1, 4, 4)
y0 = np.mod(x, 2)
y1 = 2 * (1 - y0)
y2 = np.sqrt(x)
plt.plot(x, y0) # default color is blue
plt.plot(x, y1, color="chocolate", marker="o")
# no lines, marker size is 150:
plt.scatter(x, y2, color="steelblue", marker="*", s=150);
```
%% Cell type:code id: tags:
``` python
plt.plot(x, y0, label="one")
plt.plot(x, y1, color="chocolate", marker="o", label="two")
# no lines, marker size is 150:
plt.scatter(x, y2, color="steelblue", marker="*", s=150, label="three")
plt.legend()
plt.title("with legend");
```
%% Cell type:markdown id: tags:
After `plt.subplot(m, n, i)` the following plot will paint into cell `i` in a `m` times `n` grid of plots. `m` is the number of rows, `n` is the number of columns and `i` is counted row wise:
%% Cell type:code id: tags:
``` python
# multiple plots
plt.figure(figsize=(12, 7)) # width, height
plt.subplot(2, 3, 1)
plt.plot(x, y0)
plt.plot(x, y1)
plt.title("plt.subplot(2, 3, 1)")
plt.subplot(2, 3, 2)
plt.plot(x, y1, "chocolate")
plt.title("plt.subplot(2, 3, 2)")
plt.subplot(2, 3, 3)
plt.plot(x, y2, "steelblue")
plt.title("plt.subplot(2, 3, 3)")
plt.subplot(2, 3, 4)
plt.plot(x, y1, ":")
plt.title("plt.subplot(2, 3, 4)")
plt.subplot(2, 3, 5)
plt.plot(x, y2, "*")
plt.title("plt.subplot(2, 3, 5)")
plt.subplot(2, 3, 6)
plt.plot(x, y0, "chocolate")
plt.title("plt.subplot(2, 3, 6)");
```
%% Cell type:code id: tags:
``` python
x = np.linspace(0, 2 * np.pi, 200)
y = np.sin(x)
z = np.cos(x ** 2)
z = np.cos(x**2)
plt.plot(x, y, "chocolate")
plt.plot(x, z, "steelblue");
```
%% Cell type:markdown id: tags:
# Exercise section
1. Repeat the examples above and play with them
# * Optional Exercse
2. Can you plot a circle by computing `x` and `y` vectors suitable for `plt.plot` ? Make sure that the circle looks like a circle and not like an ellipse.
3. Plot three cricles with different radii and different colors, create labels and plot a legend. Make sure that the legend shows up in the top-right corner and does not overlap with the circles.
4. Plot the three circles in 3 different plots in one row using `plt.subplot`.
%% Cell type:code id: tags:solution
``` python
#SOLUTION FOR 2
import numpy as np
import matplotlib.pyplot as plt
# SOLUTION FOR 2
import numpy as np
rad = np.linspace(0, 2 * np.pi, 100)
r = 1
x = r * np.cos(rad)
y = r * np.sin(rad)
plt.figure(figsize=(5, 5))
plt.plot(x, y, color="chocolate")
plt.title("circle")
```
%% Cell type:code id: tags:solution
``` python
# SOLUTION FOR 3
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
rad = np.linspace(0, 2 * np.pi, 100)
r = 1
def circle(r, color):
x = r * np.cos(rad)
y = r * np.sin(rad)
plt.plot(x, y, color=color, label="radius = {}".format(r))
plt.figure(figsize=(5, 5))
circle(1, "steelblue")
circle(.75, "chocolate")
circle(.5, "green")
plt.legend(loc="upper right");
circle(0.75, "chocolate")
circle(0.5, "green")
plt.legend(loc="upper right")
plt.xlim([-1.7, 1.7])
plt.ylim([-1.7, 1.7]);
```
%% Cell type:code id: tags:solution
``` python
# SOLUTION FOR 3
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
rad = np.linspace(0, 2 * np.pi, 100)
r = 1
def circle(r, color, i, n):
x = r * np.cos(rad)
y = r * np.sin(rad)
plt.subplot(1, n, i)
plt.plot(x, y, color=color)
plt.title("radius = {}".format(r))
plt.figure(figsize=(16, 5))
circle(1, "steelblue", 1, 3)
circle(.75, "chocolate", 2, 3)
circle(.5, "green", 3, 3)
circle(0.75, "chocolate", 2, 3)
circle(0.5, "green", 3, 3)
```
%% Cell type:code id: tags:
``` python
```
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment