30

Is there a way to omit some of the output from the pandas describe? This command gives me exactly what I want with a table output (count and mean of executeTime's by a simpleDate)

df.groupby('simpleDate').executeTime.describe().unstack(1)

However that's all I want, count and mean. I want to drop std, min, max, etc... So far I've only read how to modify column size.

I'm guessing the answer is going to be to re-write the line, not using describe, but I haven't had any luck grouping by simpleDate and getting the count with a mean on executeTime.

I can do count by date:

df.groupby(['simpleDate']).size()

or executeTime by date:

df.groupby(['simpleDate']).mean()['executeTime'].reset_index()

But can't figure out the syntax to combine them.

My desired output:

            count  mean  
09-10-2013      8  20.523   
09-11-2013      4  21.112  
09-12-2013      3  18.531
...            ..  ...

6 Answers 6

47

.describe() attribute generates a Dataframe where count, std, max ... are values of the index, so according to the documentation you should use .loc to retrieve just the index values desired:

df.describe().loc[['count','max']]
34

Describe returns a series, so you can just select out what you want

In [6]: s = Series(np.random.rand(10))

In [7]: s
Out[7]: 
0    0.302041
1    0.353838
2    0.421416
3    0.174497
4    0.600932
5    0.871461
6    0.116874
7    0.233738
8    0.859147
9    0.145515
dtype: float64

In [8]: s.describe()
Out[8]: 
count    10.000000
mean      0.407946
std       0.280562
min       0.116874
25%       0.189307
50%       0.327940
75%       0.556053
max       0.871461
dtype: float64

In [9]: s.describe()[['count','mean']]
Out[9]: 
count    10.000000
mean      0.407946
dtype: float64
2
  • thanks so much, I tried something like that but had the syntax off. works great
    – KHibma
    Commented Oct 1, 2013 at 19:54
  • 3
    Describe returns Series or Dataframe depending upon what you apply it at... This method just works in case of Series
    – MANU
    Commented May 15, 2020 at 10:22
21

Looking at the answers, I don't see one that actually works on a DataFrame returned from describe() after using groupby().

The documentation on MultiIndex selection gives a hint at the answer. The .xs() function works for one but not multiple selections, but .loc works.

df.groupby(['simpleDate']).describe().loc[:,(slice(None),['count','max'])]

This keeps the nice MultiIndex returned by .describe() but with only the columns selected.

2
  • This is great but the loc syntax is wrong. It should be loc[…] (ie with square brackets).
    – Seth
    Commented Dec 16, 2020 at 11:28
  • 1
    This should be the accepted answer since it's the only one that works for DFs (with multiple columns that are "described"). The accepted answer only works for series
    – dopexxx
    Commented Mar 8, 2023 at 9:36
6

The solution @Jeff provided just works for series.

@Rafa is on the point: df.describe().info() reveals that the resulting dataframe has Index: 8 entries, count to max

df.describe().loc[['count','max']] does work, but df.groupby('simpleDate').describe().loc[['count','max']], which is what the OP asked, does not work.

I think a solution may be this:

df = pd.DataFrame({'Y': ['A', 'B', 'B', 'A', 'B'],
                    'Z': [10, 5, 6, 11, 12],
                                        })

grouping the df by Y:

df_grouped=df.groupby(by='Y')     


In [207]df_grouped.agg([np.mean, len])

Out[207]: 
        Z    
     mean len
Y            
A  10.500   2
B   7.667   3
1

Sticking with describe, you can unstack the indexes and then slice normally too

df.describe().unstack()[['count','max']]

1

Why do you want to use describe in first hand and generating more than you need to just discard it? Just generate agg instead and get directly what you want:

df.groupby('simpleDate').executeTime.agg(['count','max'])

Not the answer you're looking for? Browse other questions tagged or ask your own question.