Processing data with Pandas¶
Now you should know the basics of the data structures in Pandas and how to explore your data using some tools that is provided by Pandas. Next we continue to explore some of the basic data operations that are regularly needed when doing data analysis.
Let’s first read the same data as before into Pandas to have a clean start.
In [1] dataFrame = pd.read_csv('Kumpula-June-2016-w-metadata.txt', sep=',', skiprows=8)
Calculating with DataFrames¶
One of the most common things to do in Pandas is to create new columns based on calculations between different variables (columns).
Creating a new column into our DataFrame is easy by specifying the name of the column and giving it some default value (in this case decimal number 0.0).
In [1]: dataFrame['DIFF'] = 0.0
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-467690b2748d> in <module>()
----> 1 dataFrame['DIFF'] = 0.0
NameError: name 'dataFrame' is not defined
In [2]: print(dataFrame)