Date created: 10 Feb 2020

Example how to access the climate database from python

For python we need to use SQL to query the database. We will use the pandas package to directly read in the results. For more information how to create an SQL query, you can look at https://www.w3schools.com/sql/sql_syntax.asp and/or https://tutorialzine.com/2016/01/learn-sql-in-20-minutes

Note that the access to the database is restricted to read-only, i.e. no new tables or records can be created by the users.

In [1]:
import pandas as pd
import psycopg2
In [2]:
# establish connection using information supplied in documentation
conn = psycopg2.connect(host="10.8.244.31",
                       database="climate_data",
                       user="climate_user",
                       password="meteo_data",
                       port=5432)
cur = conn.cursor()
In [3]:
# get the metadata
query = """
SELECT * FROM metadata
"""
tbl_meta = pd.read_sql_query(query, conn)
tbl_meta.info()
tbl_meta.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 346 entries, 0 to 345
Data columns (total 16 columns):
id           346 non-null int64
geom         346 non-null object
LAT          346 non-null float64
LON          346 non-null float64
ELE          346 non-null float64
P_use        281 non-null float64
P_lenght     281 non-null float64
P_source     280 non-null object
P_homo       281 non-null float64
T_use        306 non-null float64
T_lenght     306 non-null float64
T_source     306 non-null object
Tmin_homo    306 non-null float64
Tmax_homo    306 non-null float64
Province     346 non-null object
NAME         346 non-null object
dtypes: float64(10), int64(1), object(5)
memory usage: 43.4+ KB
Out[3]:
id geom LAT LON ELE P_use P_lenght P_source P_homo T_use T_lenght T_source Tmin_homo Tmax_homo Province NAME
0 260 0101000020E61000003A259CED3CAD254096BF10AC091F... 46.242483 10.838355 1681.0 1.0 2.0 None 1.0 1.0 2.0 METEOTRENTINO 0.0 1.0 TN Passo_Campo_Carlo_Magno
1 27 0101000020E61000004DC57C2F744526401C84905E9A7F... 46.996898 11.135652 2290.0 1.0 0.0 BZ-Province 0.0 NaN NaN None NaN NaN AU Dresdner_Hütte
2 30 0101000020E6100000305A91E22D992640D490E146043E... 46.484506 11.299178 239.0 0.0 2.0 BZ-Province 0.0 NaN NaN None NaN NaN BZ Etsch_Sigmundskron (M)
3 63 0101000020E6100000C29AB10510D826408F81442C4582... 47.017736 11.421997 1360.0 1.0 2.0 BZ-Province 0.0 NaN NaN None NaN NaN AU Obernberg_am_Brenner
4 164 0101000020E61000006666666666E62440F34E70777777... 46.933333 10.450000 2730.0 0.0 0.0 METEOSWISS 0.0 1.0 2.0 METEOSWISS 0.0 0.0 CH Vinadi_Alpetta
In [4]:
# get some sample data from station "deutschnofen" and since 2012
query = """
SELECT * FROM stations_data
WHERE date > '2012-01-01' AND station = 'Deutschnofen'
"""

tbl_data = pd.read_sql_query(query, conn)
tbl_data.info()
tbl_data.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2921 entries, 0 to 2920
Data columns (total 7 columns):
date        2921 non-null object
tmin        2605 non-null float64
tmax        2605 non-null float64
tmean       2605 non-null float64
prec        2599 non-null float64
province    2921 non-null object
station     2921 non-null object
dtypes: float64(4), object(3)
memory usage: 159.9+ KB
Out[4]:
date tmin tmax tmean prec province station
0 2012-01-02 -2.3 1.5 -0.4 0.0 BZ Deutschnofen
1 2012-01-03 -2.6 1.6 -0.5 17.4 BZ Deutschnofen
2 2012-01-04 -3.8 0.3 -1.8 0.0 BZ Deutschnofen
3 2012-01-05 -4.6 -0.4 -2.5 0.0 BZ Deutschnofen
4 2012-01-06 -6.3 -2.4 -4.3 1.4 BZ Deutschnofen
In [5]:
# close the connection when finished
cur.close()
conn.close()