Newest 'group-by' Questions

-4 votes

3 answers

168 views

Returning multiple columns and count of duplicates with GROUP BY clause

I'm searching for duplicate values per column and need a count of them and data from some additional columns. Table sample BillNr Name email 1000 Shakira [email protected] 1001 Shakira [email protected] ...

Kaptah

9,887

asked Nov 22 at 14:20

0 votes

2 answers

103 views

Grouping of records in case values are null

We have got a table with a identifier, a key/value pairs and a start and end timestamp which indicates the valid period for the values. MASTER_WORK_ORDR_ID START_TS END_TS WORK_ORDR_ID_CTXT ...

L.P.

13

asked Nov 19 at 15:26

3 votes

3 answers

157 views

How to retrieve a sub-array from result of array_agg?

I have a SQL table in postgres 14 that looks something like this: f_key data1 data2 fit 1 {'a1', 'a2'} null 3 1 {'b1', 'b2'} {'b3'} 2 2 {'c1', 'c2'} null 3 Note that data1 and data2 are arrays. I need ...

fitek

303

asked Nov 15 at 11:01

1 vote

5 answers

98 views

Grouping rows, and then deleting only a sub range (based on their dates) from each of those groups

I use Postgres on my web server in order to record incoming queries into a table calls2, basically writing a single row each time with lots of repeating information, such as a date field ("when&...

Thomas Tempelmann

12.4k

asked Nov 12 at 17:52

0 votes

0 answers

108 views

GridDB SQL error while using GROUP BY RANGE

I am getting error using GROUP BY RANGE in GridDB sql. I am referring to the example mention in the doc https://griddb.org/docs-en/manuals/GridDB_SQL_Reference.html#group-by-range name: trend_data1 ts ...

sayana_dutta

73

asked Nov 8 at 15:07

3 votes

4 answers

223 views

Filter a pandas df: per group, keep only non-null rows if we have them, else keep a single null row

Hopefully the title is reasonably intuitive, edits welcome. Say I have this dataframe: df = pd.DataFrame({'x': ['A', 'B', 'B', 'C', 'C', 'C', 'D', 'D'], 'y': [None, None, 1, 2, 3, 4,...

Hendy

10.7k

asked Nov 5 at 21:20

0 votes

1 answer

48 views

How to group by geography in Bigquery

I have the following code: SELECT h3s.h3id, h3s.geog, MIN(ST_DISTANCE(`carto-os`.carto.H3_CENTER(htsp.h3id), `carto-os`.carto.H3_CENTER(h3s.h3id))) OVER (PARTITION BY h3s.h3id) FROM ...

Chris

67

asked Oct 15 at 21:26

1 vote

2 answers

104 views

Grouping data by season in R when winter includes December from the previous year

I have a dataset called TotalPhosphorus, and I want to assign seasons to each observation. However, I need the winter season to include December from the previous year and January–March from the ...

Daniela

17

asked Oct 15 at 20:06

3 votes

4 answers

129 views

Creating a group by loop using either single or multiple variables from a list in R

I am trying to perform a loop which loops through a list of single or multiple variables then sums a column. I am essentially trying to paste in from a list into the group_by() function so that it ...

KatChristiansen

63

asked Oct 13 at 11:48

1 vote

1 answer

62 views

How to reverse a DolphinDB table aggregated by group by + toArray back to its original form?

I have an in - memory DolphinDB table created as follows: ticker = `AAPL`IBM`IBM`AAPL`AMZN`AAPL`AMZN`IBM`AMZN volume = 106 115 121 90 130 150 145 123 155; t = table(ticker, volume); t; The output of ...

Dongyun Huang

11

asked Oct 13 at 9:21

2 votes

1 answer

88 views

How to combine multiple rows of Pandas dataframe into one row using a key [duplicate]

I am trying to manipulate a CSV using Pandas and I need to get the data into the format of one row per ID. This is an example of what I am trying to accomplish: From: df = pd.DataFrame({ 'ID': [1, 1, ...

sar

21

asked Oct 10 at 17:26

0 votes

1 answer

94 views

R arrange after grouping [duplicate]

I have noticed that although df %>% group_by(firm) %>% arrange(week) %>% mutate(lag_sales = lag(sales)) %>% ungroup() ignores the grouping but calculates the correct lags as the ...

ZayzayR

307

asked Oct 1 at 22:28

2 votes

1 answer

78 views

New columns with values from other rows based on adjacency

I have a dataframe in R that looks like this (spaced to ease readability): utterance word syllable label syll_start syll_end 1 1 1 NA 1.1 2 1 1 ...

Jackson

75

asked Sep 29 at 18:03

0 votes

0 answers

27 views

Python display and count unique elements from a dataset [duplicate]

I have a dataset populated from an API call to Splunk. The dataset contains the following: time destip destport transport 2025-09-17 22:03:09 172.16.5.1 53 UDP 2025-09-17 22:03:10 172.16.5.1 53 UDP ...

Jhowel

63

asked Sep 25 at 13:50

3 votes

2 answers

183 views

Group by column, make a new column with label corresponding to highest value in another column

Here is an example of my data: sound word part syllable pitch_peak sound-1 mary subject 1 3.1 sound-1 mary subject 2 1.9 sound-1 studied verb 1 ...

Jackson

75

asked Sep 23 at 14:02

4 votes

0 answers

138 views

Hourly true average between timestamps [closed]

I’m storing IoT readings in a GridDB container and need one row per hour with the true average of the points that actually fall inside each hour (not interpolated values): ts_bucket ...

Badhon Ashfaq

907

asked Sep 19 at 5:47

0 votes

1 answer

59 views

How can I aggregate all columns with a 'number' type in power query

I'm trying to use power query to aggregate some invoicing columns by project number in power query. I'm currently using a group by function which looks at the project number and then aggregates each ...

Stephanie Noyce

1

asked Sep 17 at 19:17

1 vote

1 answer

126 views

group_by with polars concatenating values

I have a polars dataframe that I want to group by and concatenate the unique values in as a single entry. in pandas, I go: def unique_colun_values(x): return('|'.join(set(x))) dd=pd.DataFrame({'...

frank

3,816

asked Sep 16 at 9:16

3 votes

1 answer

99 views

How to group by day in GridDB Cloud without manually concatenating year, month, and day?

Table schema: CREATE TABLE WeatherReadings ( ts TIMESTAMP, temp DOUBLE ); Sample data: INSERT INTO WeatherReadings (ts, temp) VALUES (TIMESTAMP('2025-08-22T01:05:00Z'), 20.5), (TIMESTAMP('...

Mr Jahangir

33

asked Sep 12 at 19:55

1 vote

2 answers

100 views

Remove items within pandas DataFrameGroupBy groups

I have a dataframe df made up of n columns which are groups and one, "data". This dataframe is then grouped on the n group columns. df = pd.DataFrame(data={"g0": ["foo", ...

Aristide

43

asked Sep 8 at 13:33

0 votes

1 answer

79 views

XSLT - For-Each-Group - GroupBY not working on 2 groupby value

I am trying to use XSLT in my application(OIC) where based on input structure, I have to construct an output file which filters the records based on 2 elements. Input structure: <?xml version='1.0' ...

kumarb

497

asked Sep 8 at 10:29

7 votes

3 answers

444 views

How to sort pandas groups by (multiple/all) values of the groups?

I am trying to do a somewhat complicated group and sort operation in pandas. I want to sort the groups by their values in ascending order, using successive values for tiebreaks as needed. I have read ...

Jessica

1,813

asked Aug 26 at 20:54

0 votes

2 answers

122 views

Using GROUP BY - check for multiple conditions with a WHERE or HAVING clause for a single ID [closed]

I have a table of customer data. I will be joining it to a location table. Customer ID is distinct but Location ID is not because multiple customers can belong to one location. Each customer is ...

HawaiianShirts

15

asked Aug 8 at 21:24

0 votes

2 answers

81 views

DAX concatenate list of a column value (ex. contract) grouped by date

I'm trying to create a list of contracts that expire by dates. I looked on the many sites for a solution. I have a measure that calculates the date and i need calculated table with a summurazed ...

Pat N.

47

asked Aug 1 at 14:03

-2 votes

2 answers

189 views

Why grouping a pandas series using the same series makes no sense?

In the code example below I am grouping a pandas series using the same series but with a modified index. The groups in the end make no sense. There is no warning or error. Could you please help me ...

karpan

597

asked Jul 29 at 9:24

2 votes

2 answers

94 views

Pandas dt accessor or groupby function returning decimal numbers instead of integers in index labels where some series values NA

We're trying to group up date counts by month and index values are returning as decimals instead of integers when series contain any number of NaTs / na values. Simplified reproducible example: import ...

Chris Dixon

1,148

asked Jul 29 at 3:54

0 votes

2 answers

113 views

How to pick the latest record with GROUP BY userId

In my application I want to find the latest duty of each user from 'StaffDuty' table using hibernate query (i.e. HQL). Below is my query. query = session.createQuery("FROM StaffDuty where deptId....

KJEjava48

2,073

asked Jul 28 at 11:14

0 votes

1 answer

35 views

How to group by a column and calculate correlation coefficients between multiple columns?

I'm encountering some issues when trying to perform grouped correlation calculations in DolphinDB. Here's my scenario: I'm using DolphinDB to calculate correlations between multiple columns in a table....

RORO

1

asked Jul 24 at 8:52

0 votes

3 answers

105 views

PySpark groupBy().applyInPandas() fails with INVALID_PANDAS_UDF despite correct signature and schema for GROUPED_MAP

NOTE: This question has many related questions on StackOverFlow but I was unable to get my answer from any of them. I'm attempting to parallelize Prophet time series model training across multiple ...

Arnab Sinha

340

asked Jul 22 at 4:05

1 vote

2 answers

106 views

How do I get the last valid (non-null, non-zero) value per day in a time-series SQL query?

I’m working with time-series data in SQL Server and need to retrieve the last valid value for each day. A valid value is defined as one that is non-null and not zero. The challenge is that data points ...

vishal_gosai

21

asked Jul 9 at 17:30

1 vote

1 answer

65 views

XSLT for-each with using last and aggregation

I am not sure if this is possible with XSLT but I am trying to get the below XML into a format where it is name, title, date (if same date then only get date once), last value of In time (might not be ...

BryanG

29

asked Jul 1 at 20:07

1 vote

1 answer

72 views

TimeScaleDb/Postgres: Materialized Views(COGG): GROUP BY: group by certain field values

What I'm currently doing is this: SELECT time_bucket('60 min', raw_data.timestamp) AS time_60min, COUNT(raw_data.vehicle_class) AS "count", raw_data.vehicle_class AS "...

PhilippR

23

asked Jun 30 at 15:04

1 vote

0 answers

36 views

Use Object.groupBy function to group by a variable [duplicate]

How can I use the Object.groupBy function with a variable? For example: const inventory = [ { Phase: "Phase 1", Step: "Step 1", Task: "Task 1", Value: "5" }, ...

ConsMI

19

asked Jun 27 at 13:28

2 votes

1 answer

285 views

Dataframe behavior: Pandas 1.1.5 vs 2.3.0

I recently had to update the virtual environment for one of my libraries from Python 3.7 to 3.10, which also involved updating Pandas from 1.1.5 to 2.3.0. In the previous virtual environment, this ...

Jan Stuller

151

asked Jun 25 at 13:00

0 votes

0 answers

69 views

Preprocessing Data with Scale and then Binarize in Python

I am working on some proof of concepts for ML and want to try an unusual scaling method. I would like to group my data and then "scale" it and apply a binarize to that data. Basically I ...

Tim Romero

11

asked Jun 16 at 6:46

0 votes

1 answer

92 views

MS Access Reports: How do I group by two fields on the same level (OR?)

I have a database of music manuscripts that looks like the below diagram. A 'Source item' belongs to a certain manuscript (source). A source item is then categorized as EITHER a 'Section' of a 'Piece' ...

tapemachine86

33

asked Jun 10 at 8:31

1 vote

1 answer

134 views

How do I define a week start frequency in Pandas?

I am trying to come up with a frequency in Pandas that represents the start of a calendar week (configurable by week start). For example, for all dates from 2025-01-06 (Monday) to 2025-01-13 (Sunday), ...

bhub

149

asked Jun 6 at 14:50

2 votes

1 answer

53 views

Only display the top N rows in a dataframe that was aggregated with statistical functions but keep the primary sort

Suppose I have this: ISresult = h25.groupby(['month','impactedservice']).agg({'resolvetime': ['count','median','mean', 'min', 'max','std']}) The column list looks like this: [('resolvetime', 'count'),...

Mark G

97

asked Jun 6 at 1:22

0 votes

1 answer

35 views

Calculate difference between two rows by group in DolphinDB?

Sharing a common DolphinDB use case and solution for data processing. I have a table with four columns: order_book_id, date, Q, and revenue. I want to group the data by order_book_id and date, and ...

saki

319

asked May 28 at 9:25

0 votes

3 answers

120 views

Min and Max value on multiple cells group by third column value

I would like to extract the MIN and MAX from multiple columns (start_1, end_1, start_2, end_2) group by "Name" I have data that looks like this: start_1 end_1 start_2 end_2 name 100 ...

soosa

165

asked May 26 at 7:29

1 vote

3 answers

127 views

How to divide one group of rows by another one in a pandas long format DataFrame to compute e.g ratios?

In pandas, I have the following long format dataframe with 1 binary variable « Metric » with 2 modalities (Nb of rooms in residence, squared meters of the residence) : pd.DataFrame({'State': {0: 'New ...

Lucas

59

asked May 25 at 10:17

0 votes

1 answer

97 views

Does BigQuery `GROUP by grouping set` perform better than `Group By Union`

BigQuery has a newly GROUP by grouping set [1]. It is syntax simpler than the traditional Group By Union approach. I wonder if it also performs much better, because grouping set only scan the source ...

Hui Zheng

3,247

asked May 21 at 18:39

-1 votes

1 answer

57 views

Group by multiple strings into one field in Vertica [closed]

With this data: name movie john big daddy bob titanic john avatar I want the output to be: name movie john big daddy, avatar bob titanic tried this: SELECT name, LIST_AGG(movie) from people.table ...

yesiamamir

49

asked May 21 at 9:00

0 votes

2 answers

81 views

How do I sum over different combinations of variables in DBV?

Following is my table. I want to sum across all different combinations and put the sum in separate columns, not in the same column. data: Subject Var1 Var2 Var3 Var4 Constant1 Constant2 ONE 1 2 1 1 A ...

user10969476

21

asked May 19 at 17:25

0 votes

0 answers

19 views

In DolphinDB, how to calculate stock residual return based on a table using the ols function?

I have a table “clean_factor“ where the column “y“ indicates the stock returns and the subsequent columns indicate factor exposures. How do I calculate the daily residual return of each stock with the ...

smile qian

9

asked May 15 at 7:34

1 vote

3 answers

94 views

In sql, group by using similar group_name

How can I perform a GROUP BY in SQL when the group_name values are similar but not exactly the same? In my dataset, the group_name values may differ slightly (e.g., "Apple Inc.", "...

Ahamad

1

asked May 15 at 7:23

1 vote

3 answers

96 views

MySQL, join to find only max transaction count based of group

I am trying to find the max of transactions count based on type for each user. id user_id type 1 1 A 2 1 B 3 1 C 4 1 A 5 2 B 6 2 C 7 2 C 8 2 C I am expecting the output to be: user_id type count 1 A 2 ...

Mr.Singh

2,055

asked May 13 at 6:22

1 vote

2 answers

69 views

Trying to use groupby on a Pandas DF to do a reverse lookup

I am trying to figure out how to code a reverse look up in pandas dataframe using groupby and looking for the owner of a max time. ` import pandas as pd df = {'Name': ['Mike', 'Lilly', 'Frank', 'Jane',...

Tim Romero

11

asked May 12 at 1:38

0 votes

1 answer

101 views

How can I use the COUNT and GROUP BY along with the CASE commands in SQLlite to show the number of students with each letter grade?

I am trying to use the SQL commands COUNT and GROUP BY to show the number of students with each letter grade, but I'm having difficulty in doing so. A new column that I created contains a letter grade ...

Colton

1

asked May 10 at 14:21

0 votes

0 answers

42 views

calculating First time right (%) using call logs dataset from a contact center

I am calculating the First Time Resolution (FTR) percentage from call logs using the following Python code with pandas and numpy. When I run the code on one CSV file (calls_logs_cleaned_2025-05-02.csv)...

IAIMT2024

1

asked May 6 at 1:43

Collectives™ on Stack Overflow