36,975 questions
-4
votes
3
answers
168
views
Returning multiple columns and count of duplicates with GROUP BY clause
I'm searching for duplicate values per column and need a count of them and data from some additional columns.
Table sample
BillNr
Name
email
1000
Shakira
[email protected]
1001
Shakira
[email protected]
...
0
votes
2
answers
103
views
Grouping of records in case values are null
We have got a table with a identifier, a key/value pairs and a start and end timestamp which indicates the valid period for the values.
MASTER_WORK_ORDR_ID
START_TS
END_TS
WORK_ORDR_ID_CTXT
...
3
votes
3
answers
157
views
How to retrieve a sub-array from result of array_agg?
I have a SQL table in postgres 14 that looks something like this:
f_key
data1
data2
fit
1
{'a1', 'a2'}
null
3
1
{'b1', 'b2'}
{'b3'}
2
2
{'c1', 'c2'}
null
3
Note that data1 and data2 are arrays.
I need ...
1
vote
5
answers
98
views
Grouping rows, and then deleting only a sub range (based on their dates) from each of those groups
I use Postgres on my web server in order to record incoming queries into a table calls2, basically writing a single row each time with lots of repeating information, such as a date field ("when&...
0
votes
0
answers
108
views
GridDB SQL error while using GROUP BY RANGE
I am getting error using GROUP BY RANGE in GridDB sql. I am referring to the example mention in the doc https://griddb.org/docs-en/manuals/GridDB_SQL_Reference.html#group-by-range
name: trend_data1
ts
...
3
votes
4
answers
223
views
Filter a pandas df: per group, keep only non-null rows if we have them, else keep a single null row
Hopefully the title is reasonably intuitive, edits welcome. Say I have this dataframe:
df = pd.DataFrame({'x': ['A', 'B', 'B', 'C', 'C', 'C', 'D', 'D'],
'y': [None, None, 1, 2, 3, 4,...
0
votes
1
answer
48
views
How to group by geography in Bigquery
I have the following code:
SELECT
h3s.h3id, h3s.geog,
MIN(ST_DISTANCE(`carto-os`.carto.H3_CENTER(htsp.h3id), `carto-os`.carto.H3_CENTER(h3s.h3id)))
OVER (PARTITION BY h3s.h3id)
FROM
...
1
vote
2
answers
104
views
Grouping data by season in R when winter includes December from the previous year
I have a dataset called TotalPhosphorus, and I want to assign seasons to each observation. However, I need the winter season to include December from the previous year and January–March from the ...
3
votes
4
answers
129
views
Creating a group by loop using either single or multiple variables from a list in R
I am trying to perform a loop which loops through a list of single or multiple variables then sums a column. I am essentially trying to paste in from a list into the group_by() function so that it ...
1
vote
1
answer
62
views
How to reverse a DolphinDB table aggregated by group by + toArray back to its original form?
I have an in - memory DolphinDB table created as follows:
ticker = `AAPL`IBM`IBM`AAPL`AMZN`AAPL`AMZN`IBM`AMZN
volume = 106 115 121 90 130 150 145 123 155;
t = table(ticker, volume);
t;
The output of ...
2
votes
1
answer
88
views
How to combine multiple rows of Pandas dataframe into one row using a key [duplicate]
I am trying to manipulate a CSV using Pandas and I need to get the data into the format of one row per ID.
This is an example of what I am trying to accomplish:
From:
df = pd.DataFrame({
'ID': [1, 1, ...
0
votes
1
answer
94
views
R arrange after grouping [duplicate]
I have noticed that although
df %>%
group_by(firm) %>%
arrange(week) %>%
mutate(lag_sales = lag(sales)) %>%
ungroup()
ignores the grouping but calculates the correct lags as the ...
2
votes
1
answer
78
views
New columns with values from other rows based on adjacency
I have a dataframe in R that looks like this (spaced to ease readability):
utterance word syllable label syll_start syll_end
1 1 1 NA 1.1 2
1 1 ...
0
votes
0
answers
27
views
Python display and count unique elements from a dataset [duplicate]
I have a dataset populated from an API call to Splunk.
The dataset contains the following:
time
destip
destport
transport
2025-09-17 22:03:09
172.16.5.1
53
UDP
2025-09-17 22:03:10
172.16.5.1
53
UDP
...
3
votes
2
answers
183
views
Group by column, make a new column with label corresponding to highest value in another column
Here is an example of my data:
sound word part syllable pitch_peak
sound-1 mary subject 1 3.1
sound-1 mary subject 2 1.9
sound-1 studied verb 1 ...
4
votes
0
answers
138
views
Hourly true average between timestamps [closed]
I’m storing IoT readings in a GridDB container and need one row per hour with the true average of the points that actually fall inside each hour (not interpolated values):
ts_bucket ...
0
votes
1
answer
59
views
How can I aggregate all columns with a 'number' type in power query
I'm trying to use power query to aggregate some invoicing columns by project number in power query.
I'm currently using a group by function which looks at the project number and then aggregates each ...
1
vote
1
answer
126
views
group_by with polars concatenating values
I have a polars dataframe that I want to group by and concatenate the unique values in as a single entry.
in pandas, I go:
def unique_colun_values(x):
return('|'.join(set(x)))
dd=pd.DataFrame({'...
3
votes
1
answer
99
views
How to group by day in GridDB Cloud without manually concatenating year, month, and day?
Table schema:
CREATE TABLE WeatherReadings
(
ts TIMESTAMP,
temp DOUBLE
);
Sample data:
INSERT INTO WeatherReadings (ts, temp)
VALUES
(TIMESTAMP('2025-08-22T01:05:00Z'), 20.5),
(TIMESTAMP('...
1
vote
2
answers
100
views
Remove items within pandas DataFrameGroupBy groups
I have a dataframe df made up of n columns which are groups and one, "data". This dataframe is then grouped on the n group columns.
df = pd.DataFrame(data={"g0": ["foo", ...
0
votes
1
answer
79
views
XSLT - For-Each-Group - GroupBY not working on 2 groupby value
I am trying to use XSLT in my application(OIC) where based on input structure, I have to construct an output file which filters the records based on 2 elements.
Input structure:
<?xml version='1.0' ...
7
votes
3
answers
444
views
How to sort pandas groups by (multiple/all) values of the groups?
I am trying to do a somewhat complicated group and sort operation in pandas. I want to sort the groups by their values in ascending order, using successive values for tiebreaks as needed.
I have read ...
0
votes
2
answers
122
views
Using GROUP BY - check for multiple conditions with a WHERE or HAVING clause for a single ID [closed]
I have a table of customer data. I will be joining it to a location table. Customer ID is distinct but Location ID is not because multiple customers can belong to one location. Each customer is ...
0
votes
2
answers
81
views
DAX concatenate list of a column value (ex. contract) grouped by date
I'm trying to create a list of contracts that expire by dates. I looked on the many sites for a solution.
I have a measure that calculates the date and i need calculated table with a summurazed ...
-2
votes
2
answers
189
views
Why grouping a pandas series using the same series makes no sense?
In the code example below I am grouping a pandas series using the same series but with a modified index.
The groups in the end make no sense. There is no warning or error.
Could you please help me ...
2
votes
2
answers
94
views
Pandas dt accessor or groupby function returning decimal numbers instead of integers in index labels where some series values NA
We're trying to group up date counts by month and index values are returning as decimals instead of integers when series contain any number of NaTs / na values.
Simplified reproducible example:
import ...
0
votes
2
answers
113
views
How to pick the latest record with GROUP BY userId
In my application I want to find the latest duty of each user from 'StaffDuty' table using hibernate query (i.e. HQL). Below is my query.
query = session.createQuery("FROM StaffDuty where deptId....
0
votes
1
answer
35
views
How to group by a column and calculate correlation coefficients between multiple columns?
I'm encountering some issues when trying to perform grouped correlation calculations in DolphinDB. Here's my scenario:
I'm using DolphinDB to calculate correlations between multiple columns in a table....
0
votes
3
answers
105
views
PySpark groupBy().applyInPandas() fails with INVALID_PANDAS_UDF despite correct signature and schema for GROUPED_MAP
NOTE: This question has many related questions on StackOverFlow but I was unable to get my answer from any of them.
I'm attempting to parallelize Prophet time series model training across multiple ...
1
vote
2
answers
106
views
How do I get the last valid (non-null, non-zero) value per day in a time-series SQL query?
I’m working with time-series data in SQL Server and need to retrieve the last valid value for each day. A valid value is defined as one that is non-null and not zero.
The challenge is that data points ...
1
vote
1
answer
65
views
XSLT for-each with using last and aggregation
I am not sure if this is possible with XSLT but I am trying to get the below XML into a format where it is name, title, date (if same date then only get date once), last value of In time (might not be ...
1
vote
1
answer
72
views
TimeScaleDb/Postgres: Materialized Views(COGG): GROUP BY: group by certain field values
What I'm currently doing is this:
SELECT
time_bucket('60 min', raw_data.timestamp) AS time_60min,
COUNT(raw_data.vehicle_class) AS "count",
raw_data.vehicle_class AS "...
1
vote
0
answers
36
views
Use Object.groupBy function to group by a variable [duplicate]
How can I use the Object.groupBy function with a variable?
For example:
const inventory = [
{ Phase: "Phase 1", Step: "Step 1", Task: "Task 1", Value: "5" },
...
2
votes
1
answer
285
views
Dataframe behavior: Pandas 1.1.5 vs 2.3.0
I recently had to update the virtual environment for one of my libraries from Python 3.7 to 3.10, which also involved updating Pandas from 1.1.5 to 2.3.0.
In the previous virtual environment, this ...
0
votes
0
answers
69
views
Preprocessing Data with Scale and then Binarize in Python
I am working on some proof of concepts for ML and want to try an unusual scaling method. I would like to group my data and then "scale" it and apply a binarize to that data. Basically I ...
0
votes
1
answer
92
views
MS Access Reports: How do I group by two fields on the same level (OR?)
I have a database of music manuscripts that looks like the below diagram.
A 'Source item' belongs to a certain manuscript (source). A source item is then categorized as EITHER a 'Section' of a 'Piece' ...
1
vote
1
answer
134
views
How do I define a week start frequency in Pandas?
I am trying to come up with a frequency in Pandas that represents the start of a calendar week (configurable by week start). For example, for all dates from 2025-01-06 (Monday) to 2025-01-13 (Sunday), ...
2
votes
1
answer
53
views
Only display the top N rows in a dataframe that was aggregated with statistical functions but keep the primary sort
Suppose I have this:
ISresult = h25.groupby(['month','impactedservice']).agg({'resolvetime': ['count','median','mean', 'min', 'max','std']})
The column list looks like this:
[('resolvetime', 'count'),...
0
votes
1
answer
35
views
Calculate difference between two rows by group in DolphinDB?
Sharing a common DolphinDB use case and solution for data processing.
I have a table with four columns: order_book_id, date, Q, and revenue.
I want to group the data by order_book_id and date, and ...
0
votes
3
answers
120
views
Min and Max value on multiple cells group by third column value
I would like to extract the MIN and MAX from multiple columns (start_1, end_1, start_2, end_2) group by "Name"
I have data that looks like this:
start_1 end_1 start_2 end_2 name
100 ...
1
vote
3
answers
127
views
How to divide one group of rows by another one in a pandas long format DataFrame to compute e.g ratios?
In pandas, I have the following long format dataframe with 1 binary variable « Metric » with 2 modalities (Nb of rooms in residence, squared meters of the residence) :
pd.DataFrame({'State': {0: 'New ...
0
votes
1
answer
97
views
Does BigQuery `GROUP by grouping set` perform better than `Group By Union`
BigQuery has a newly GROUP by grouping set [1].
It is syntax simpler than the traditional Group By Union approach. I wonder if it also performs much better, because grouping set only scan the source ...
-1
votes
1
answer
57
views
Group by multiple strings into one field in Vertica [closed]
With this data:
name
movie
john
big daddy
bob
titanic
john
avatar
I want the output to be:
name
movie
john
big daddy, avatar
bob
titanic
tried this:
SELECT name, LIST_AGG(movie)
from people.table
...
0
votes
2
answers
81
views
How do I sum over different combinations of variables in DBV?
Following is my table. I want to sum across all different combinations and put the sum in separate columns, not in the same column.
data:
Subject
Var1
Var2
Var3
Var4
Constant1
Constant2
ONE
1
2
1
1
A
...
0
votes
0
answers
19
views
In DolphinDB, how to calculate stock residual return based on a table using the ols function?
I have a table “clean_factor“ where the column “y“ indicates the stock returns and the subsequent columns indicate factor exposures. How do I calculate the daily residual return of each stock with the ...
1
vote
3
answers
94
views
In sql, group by using similar group_name
How can I perform a
GROUP BY
in SQL when the group_name values are similar but not exactly the same?
In my dataset, the group_name values may differ slightly (e.g., "Apple Inc.", "...
1
vote
3
answers
96
views
MySQL, join to find only max transaction count based of group
I am trying to find the max of transactions count based on type for each user.
id
user_id
type
1
1
A
2
1
B
3
1
C
4
1
A
5
2
B
6
2
C
7
2
C
8
2
C
I am expecting the output to be:
user_id
type
count
1
A
2
...
1
vote
2
answers
69
views
Trying to use groupby on a Pandas DF to do a reverse lookup
I am trying to figure out how to code a reverse look up in pandas dataframe using groupby and looking for the owner of a max time.
`
import pandas as pd
df = {'Name': ['Mike', 'Lilly', 'Frank', 'Jane',...
0
votes
1
answer
101
views
How can I use the COUNT and GROUP BY along with the CASE commands in SQLlite to show the number of students with each letter grade?
I am trying to use the SQL commands COUNT and GROUP BY to show the number of students with each letter grade, but I'm having difficulty in doing so. A new column that I created contains a letter grade ...
0
votes
0
answers
42
views
calculating First time right (%) using call logs dataset from a contact center
I am calculating the First Time Resolution (FTR) percentage from call logs using the following Python code with pandas and numpy. When I run the code on one CSV file (calls_logs_cleaned_2025-05-02.csv)...