What is the equivalent of pandas df.groupby('v1').apply(lambda x:['v2'].nunique()) with posgres sql?
i.e. given a table I want to know the number of unique values of v2 for each v1.
Maybe you mean
SELECT v1, count(DISTINCT v2)
FROM df
GROUP BY v1;
ORDER BY clause.Also check his post array_agg. It was helpful to me. It will give you an array list. I just did something like:
SELECT directory, ARRAY_AGG(file_name)
FROM table
WHERE type = 'ZIP'
GROUP BY directory;
And the result was something like:
parent_directory | array_agg | ------------------------+----------------------------------------+
/home/postgresql/files | {zip_1.zip,zip_2.zip,zip_3.zip} |
/home/postgresql/files2 | {file1.zip,file2.zip} |
This post also helped me a lot: "Group By" in SQL and Python Pandas. It basically says that it is more convenient to use only SQL when possible, but that Python Pandas can be useful to achieve extra functionalities in the filtering process.
I hope it helps