Modify output from Python Pandas describe

Question

Is there a way to omit some of the output from the pandas describe? This command gives me exactly what I want with a table output (count and mean of executeTime's by a simpleDate)

df.groupby('simpleDate').executeTime.describe().unstack(1)

However that's all I want, count and mean. I want to drop std, min, max, etc... So far I've only read how to modify column size.

I'm guessing the answer is going to be to re-write the line, not using describe, but I haven't had any luck grouping by simpleDate and getting the count with a mean on executeTime.

I can do count by date:

df.groupby(['simpleDate']).size()

or executeTime by date:

df.groupby(['simpleDate']).mean()['executeTime'].reset_index()

But can't figure out the syntax to combine them.

My desired output:

            count  mean  
09-10-2013      8  20.523   
09-11-2013      4  21.112  
09-12-2013      3  18.531
...            ..  ...

Rafa · Accepted Answer · 2022-07-17 09:52:40Z

47

.describe() attribute generates a Dataframe where count, std, max ... are values of the index, so according to the documentation you should use .loc to retrieve just the index values desired:

df.describe().loc[['count','max']]

edited Jul 17, 2022 at 9:52

answered Sep 11, 2015 at 7:26

Rafa

2,9952 gold badges22 silver badges27 bronze badges

Add a comment |

Jeff · Accepted Answer · 2013-10-01 19:31:15Z

34

Describe returns a series, so you can just select out what you want

In [6]: s = Series(np.random.rand(10))

In [7]: s
Out[7]: 
0    0.302041
1    0.353838
2    0.421416
3    0.174497
4    0.600932
5    0.871461
6    0.116874
7    0.233738
8    0.859147
9    0.145515
dtype: float64

In [8]: s.describe()
Out[8]: 
count    10.000000
mean      0.407946
std       0.280562
min       0.116874
25%       0.189307
50%       0.327940
75%       0.556053
max       0.871461
dtype: float64

In [9]: s.describe()[['count','mean']]
Out[9]: 
count    10.000000
mean      0.407946
dtype: float64

answered Oct 1, 2013 at 19:31

Jeff

128k21 gold badges222 silver badges189 bronze badges

thanks so much, I tried something like that but had the syntax off. works great
– KHibma
Commented Oct 1, 2013 at 19:54
3

Describe returns Series or Dataframe depending upon what you apply it at... This method just works in case of Series
– MANU
Commented May 15, 2020 at 10:22

Add a comment |

Josh Ziegler · Accepted Answer · 2020-12-16 20:09:38Z

21

Looking at the answers, I don't see one that actually works on a DataFrame returned from describe() after using groupby().

The documentation on MultiIndex selection gives a hint at the answer. The .xs() function works for one but not multiple selections, but .loc works.

df.groupby(['simpleDate']).describe().loc[:,(slice(None),['count','max'])]

This keeps the nice MultiIndex returned by .describe() but with only the columns selected.

edited Dec 16, 2020 at 20:09

answered Nov 18, 2020 at 21:26

Josh Ziegler

4664 silver badges9 bronze badges

This is great but the loc syntax is wrong. It should be loc[…] (ie with square brackets).
– Seth
Commented Dec 16, 2020 at 11:28
1

This should be the accepted answer since it's the only one that works for DFs (with multiple columns that are "described"). The accepted answer only works for series
– dopexxx
Commented Mar 8, 2023 at 9:36

Add a comment |

st19297 · Accepted Answer · 2016-11-22 23:45:22Z

The solution @Jeff provided just works for series.

@Rafa is on the point: df.describe().info() reveals that the resulting dataframe has Index: 8 entries, count to max

df.describe().loc[['count','max']] does work, but df.groupby('simpleDate').describe().loc[['count','max']], which is what the OP asked, does not work.

I think a solution may be this:

df = pd.DataFrame({'Y': ['A', 'B', 'B', 'A', 'B'],
                    'Z': [10, 5, 6, 11, 12],
                                        })

grouping the df by Y:

df_grouped=df.groupby(by='Y')     


In [207]df_grouped.agg([np.mean, len])

Out[207]: 
        Z    
     mean len
Y            
A  10.500   2
B   7.667   3

Geoff Counihan · Accepted Answer · 2017-10-12 03:49:14Z

1

Sticking with describe, you can unstack the indexes and then slice normally too

df.describe().unstack()[['count','max']]

answered Oct 12, 2017 at 3:49

Geoff Counihan

111 bronze badge

Add a comment |

Steffen · Accepted Answer · 2024-07-03 06:13:09Z

1

Why do you want to use describe in first hand and generating more than you need to just discard it? Just generate agg instead and get directly what you want:

df.groupby('simpleDate').executeTime.agg(['count','max'])

answered Jul 3 at 6:13

Steffen

314 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Modify output from Python Pandas describe

6 Answers 6

Not the answer you're looking for? Browse other questions tagged
python
pandas
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Not the answer you're looking for? Browse other questions tagged pythonpandas or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
or ask your own question.