0

I have a Pandas DataFrame of measurements:

,Fp076,Fp084,Fp092,Fp099,Fp107,Fp115,Fp122,Fp130,Fp143,Fp151,Fp158,Fp166,Fp174,Fp181,Fp189,Fp197,Fp204,Fp212,Fp220,Fp227
0,0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147
1,-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467
2,0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189
3,2.6838,2.394591,2.493416,0.874906,2.113343,1.812258,1.667047,1.779347,1.515663,1.620196,1.539494,1.63528,1.555373,1.471318,1.610067,1.507087,1.467174,1.458346,1.681998,1.14625
4,0.368415,0.435004,0.155035,0.161064,0.180133,0.202117,0.142981,0.138321,0.122557,0.099213,0.098213,0.062174,0.123664,0.2051,0.167415,0.185133,0.127677,0.037875,0.156252,0.015579
5,0.213577,0.187244,0.274151,0.173572,0.296122,0.308341,0.164578,0.159559,0.318383,0.181329,0.260223,0.257395,0.241779,0.292731,0.244476,0.187523,0.247331,0.293338,0.323894,0.179478
6,0.096093,0.140454,0.067185,6.441058,0.016797,0.141757,0.181792,0.13692,0.204091,0.180182,0.149626,0.220342,0.179286,0.276316,0.104531,0.20343,0.045161,-0.004546,0.045833,0.193849
7,0.286467,0.086673,-0.106538,-0.261802,0.16964,0.182858,0.062774,0.20471,0.040105,0.086975,0.211068,0.182423,0.098721,0.077085,0.102986,0.129935,0.130571,0.176024,0.154079,0.102391
8,0.480631,0.714554,0.858241,0.746666,0.555411,0.452689,0.337912,0.333942,0.269359,0.221312,0.09818,0.226218,0.287361,0.209858,0.222951,0.207584,0.258397,0.026713,0.162048,0.149924
9,1.055405,0.638777,0.468793,0.41544,0.559187,0.471218,0.493805,0.544716,0.412903,0.412182,0.51041,0.383991,0.351397,0.383201,0.368308,0.237954,0.330242,0.262648,0.425204,0.434928
10,1.116658,0.737544,0.854376,-0.004434,0.419419,0.35921,0.377095,0.273815,0.258913,0.290614,0.271843,0.321572,0.234764,0.298931,0.206039,0.192746,0.200727,0.132419,0.229914,0.159857
11,-0.004305,0.052289,0.275035,-0.849414,0.104146,0.185819,0.128376,0.136433,0.091787,0.149753,0.107246,0.081407,0.118816,0.117434,0.169153,0.108273,0.205751,0.145238,0.153086,0.114278
12,0.836223,0.323901,0.269564,0.364082,0.343695,0.386785,0.24881,0.307267,0.222634,0.214189,0.12167,0.251107,0.134083,0.284545,0.175479,0.221877,0.184749,0.225089,0.205388,0.214972

where each row is the flux measurements at the frequencies in the header (76, 84, 92, 99... MHz). I'm trying to plot a line graph of the flux measurements for a row. Since the frequencies in the header are not linear, I've tried this:

f = np.array([76,84,92,99,107,115,122,130,143,151,158,166,174,181,189,197,204,212,220,227])
y1 = [0.531743,0.512256,0.427771,0.444216,0.332228,0.296139,0.202653,0.298724,0.341529,0.276829,0.24803,0.278406,0.345853,0.317384,0.32032,0.179936,0.205871,0.495948,0.167417,0.097147]
y2 = [-0.032964,0.047469,0.128079,0.142839,0.253755,0.165963,0.210111,0.239816,0.162333,0.115085,0.129781,0.134795,0.09575,0.243093,0.10684,0.195201,0.143984,0.266312,0.198049,0.084467]
y3 = [0.459728,0.541346,0.830889,0.368135,0.407241,0.499617,0.383159,0.507517,0.409411,0.325441,0.305605,0.378738,0.342981,0.43766,0.295844,0.228164,0.276319,0.226467,0.375678,0.219189]

fig, ax = plt.subplots()
ax.scatter(f, y1, label = r'$\alpha = -0.37$')
ax.plot(f, y1)
ax.scatter(f, y2, label = r'$\alpha = NaN$')
ax.plot(f, y2)
ax.scatter(f, y3, label = r'$\alpha = -0.75$')
ax.plot(f, y3)
ax.set_xlabel('Frequency (MHz)')
ax.set_ylabel('Flux (Jy/beam)')
ax.grid(which = 'both', axis = 'both')

which is just copy-pasting the first three rows of data, to produce:

enter image description here

That's basically what I want, but what's a better way to do it?

5
  • The default pandas histogram plot mode is to plot each column as a separate line plot. If you took the transpose of your dataframe, you'd have each row turn into a column. Commented Jul 17, 2022 at 0:00
  • The is the correct way: 1. df.columns = df.columns.str.replace('Fp', '').astype('int'), 2. df = df.T, 3. ax = df.plot(marker='.', figsize=(10, 7), title='Flux per Frequency', ylabel='Flux (Jy/beam)', xlabel='Frequency (MHz)', grid=True ) Commented Jul 17, 2022 at 0:13
  • See code and plot Commented Jul 17, 2022 at 0:48
  • The hsv colormap was used to add more colors, since there are many observations. There are more colormaps at Choosing Colormaps in Matplotlib Commented Jul 17, 2022 at 1:10
  • df = pd.read_csv('file.csv', index_col=[0]) Commented Jul 17, 2022 at 14:22

1 Answer 1

1

There are many ways to solve this problem, but the simplest way (that I can think of) is to pivot your dataframe and then use seaborn to plot all the columns

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# convert you sample data
data = [[e for e in row.split(',') if e] for row in data_.split("\n") if row]
columns = data[0]
# create the `x` axis
columns = [int(col.replace('Fp','')) for col in columns]
columns = ['index'] + columns
data = data[1:]
df = pd.DataFrame(data=data, columns=columns)
df = df.drop(columns=['index'])
df = df.astype('float')

This is the example of the dataframe without transforming the headers with int(col.replace('Fp',''))

enter image description here

you can transform your columns as I did above using

df.columns = [int(col.replace('Fp',''))  for col in df.columns]

Once this is done you can do the following pivot

# the pivot of your data
df_ = df.T

# plot your data
plt.figure(figsize=(15,8))
sns.lineplot(data=df_)
plt.title('Example of timeseries plot')
plt.xlabel('Frequency(MHz)')
plt.ylabel('Flux (Jy/beam)')

the output is enter image description here

You can play around with the various plotting to your desire, but this would be the simplest way (tip - try to leverage as much of the seaborn or pandas plotting methods for this aggregated plots)

Sign up to request clarification or add additional context in comments.

6 Comments

The op states the data already starts in a dataframe. This is not the correct way beginning with the data in the dataframe.
I know, but sadly the data is not in a pandas dataframe in the question - the first part of the code above is to take what he has and put it in a dataframe. I can be explicit and tell him to change the columns in his dataframe as mine, but apart from that it is consistent.
The first sentence of the op I have a Pandas DataFrame of measurements:
yes - but I don't have the dataframe, I have to create it from the values he has provided. Did not think I had to explain that - but I have edited to make it clearer.
The answer should not include constructing the dataframe, because that is irrelevant to the question. The OP already starts with a dataframe. The answer should show what to do with the dataframe to create the plot. There is no reason to use seaborn, because the dataframe can be plotted directly, as has already been demonstrated in a comment to the question.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.