0

Say I have 3 list of data that I am using to build a new query. I need to take these list of data and return the values of those list where things were found.

So my question is this:

Is there a standard method of taking a list and using it as a column?

I will need to use multiple list as columns where one column is the "JOIN ON" or "WHERE IN" portion.

The results from my first query are used to build my 3 list.

Say I get back this data:

[[ID, TYPE, OTHER],
 [1, C, S], 
 [2, C, O],
 [3, D, D],
 [4, D, H]]

Then convert that table/2D Array to the following Python List:

[1, 2, 3, 4]
[C, C, D, D]
[S, O, D, H]

Now I want to use those 2 list as columns in a select statement like this:

select [C, C, D, D] as TYPE # These 2 list are needed to return in the correct order
      ,[S, O, D, H] as OTHER  # as it relates to [1, 2, 3, 4] in the WHERE.
      ,table.value
      ,table.color
From table
where table.value in [1, 2, 3, 4]  # one list is used to deal with the where 

table contains 2 columns:

VALUE    COLOR
1        Red
2        Green
3        Blue
4        Black

Results should look like this:

TYPE    OTHER    VALUE    COLOR
C       S        1        Red
C       O        2        Green
D       D        3        Blue
D       H        4        Black
11
  • Are you using MS SQL Server or Oracle? Commented Nov 13, 2019 at 21:01
  • @jarlh both. Some of my queries will be directly on the SQL Server and some will be using OPENQUERY against a linked Oracle server. I can get data with single values as columns but I need to be able to do this with a list of values. Otherwise I will have to build 20 Union statements and I really don't think that is efficient. Commented Nov 13, 2019 at 21:03
  • 1
    Not sure how you would do this in python but what you are describing is a table valued parameter in sql server. Commented Nov 13, 2019 at 21:08
  • @SeanLange this has to be pass in a query string that is sent to the SQL Server via pyodbc using the ODBC standard drivers. Again I can do this with single values just not lists. Commented Nov 13, 2019 at 21:09
  • Well as you have found out you can't pass arrays of values in a standard string. And you have a two dimensional array at that. You can use a string splitter and pass in delimited lists. But you are going to need one that returns the ordinal position of each element so you can reassemble a delimited list and get the values lined up correctly. Here is a great example of one of those. sqlservercentral.com/articles/… Commented Nov 13, 2019 at 21:12

2 Answers 2

0

Update:

1, Convert the table into 3 lists:

import cx_Oracle
import pandas

db = cx_Oracle.connect('*******', '********', '*******')
conn = db.cursor()

sql_test = '''SELECT * FROM TEST'''

sql_table = '''SELECT * FROM "TABLE" '''

df_test = pandas.read_sql_query(sql_test,db)
df_table = pandas.read_sql_query(sql_table,db)

ser_aggCol=df_test.aggregate(lambda x: [x.tolist()], axis=0).map(lambda x:x[0])

print(ser_aggCol, sep='\n', end='\n\n\n')

print(ser_aggCol['ID'])
print(ser_aggCol['TYPE'])
print(ser_aggCol['OTHER'])

Output:

enter image description here

2, Use while loop to concat the dataframe row by row.

sql_max = '''SELECT MAX(ID) FROM TEST'''
conn.execute(sql_max)
max_id = 0
for result in conn:
    max_id = result[0]

print(max_id)

i = 0
sql_first_row = '''select \'''' + ser_aggCol['TYPE'][i] + '''\' as TYPE, \'''' + ser_aggCol['OTHER'][i] + '''\' AS OTHER, VALUE, COLOR FROM "TABLE" WHERE VALUE = '''+ str(ser_aggCol['ID'][i])
df_result = pandas.read_sql_query(sql_first_row,db)

while i + 1 <= max_id - 1:
    new_sql = '''select \'''' + ser_aggCol['TYPE'][i+1] + '''\' as TYPE, \'''' + ser_aggCol['OTHER'][i+1] + '''\' AS OTHER, VALUE, COLOR FROM "TABLE" WHERE VALUE = '''+ str(ser_aggCol['ID'][i+1])
    df_new = pandas.read_sql_query(new_sql, db)
    df_result = pandas.concat([df_result,df_new])
    i = i + 1

print(df_result)

Output:

enter image description here


Original Post:

My approach is:

1, Read the SQL results into the dataframes

pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None)

2, Join the dataframes using Pandas

DataFrame.join(self, other, on=None, how='left', lsuffix='', rsuffix='', sort=False)


Testing:

I do not have the SQL Server client on my new PC so I created those two tables in Oracle only. You just need to add the SQL Server connection in your Python. Let me know if you got stuck here.

Created two tables in Oracle:

Test:

enter image description here

Table:

enter image description here

Then Python:

import cx_Oracle
import pandas

db = cx_Oracle.connect('********', '********', '********')
conn = db.cursor()

sql_test = '''SELECT * FROM TEST'''

sql_table = '''SELECT * FROM "TABLE" '''

df_test = pandas.read_sql_query(sql_test,db)
df_table = pandas.read_sql_query(sql_table,db)

print(df_test) 
print(df_table)

print(df_test.set_index('ID').join(df_table.set_index('VALUE')))

Output:

   ID TYPE OTHER
0   1    C     S
1   2    C     O
2   3    D     D
3   4    D     H

   VALUE  COLOR
0      1    Red
1      2  Green
2      3   Blue
3      4  Black

ID  TYPE OTHER  COLOR                  
1     C     S    Red
2     C     O  Green
3     D     D   Blue
4     D     H  Black
Sign up to request clarification or add additional context in comments.

6 Comments

I have updated my questions to provide some more context and how I plan to use the data. I will have to wait till I get back to work tomorrow to test the code against out server.
@Mike-SMT I just updated my answer. Let me know if you have any question.
That is interesting is df_test.set_index('ID').join(df_table.set_index('VALUE')) basically the same thing as JOIN ON?. My concern here is that it looks to be a full table scan between the 2 tables and that will be a hefty price on the spool space. Pulling down all the data will not work. I need to somehow send the column of say TEST as a select statement to the Oracle server so it is only pulling back values I need.
@Mike-SMT, Could you test it? As I know, pandas now has one of the fastest in-memory database join operators out there.
@Mike-SMT, Also, you can use DataFrame.where() after you joined the tables to pull the data you need.
|
0

Ok so here is the solution I have to go with.

Instead of managing the data on the client side I will create a new database on the SQL Server and then all the needed tables the tool will interact with.

Then I will have the program insert and delete rows in each table based on the user who is using the tool.

Getting the username is simple as:

import os

print(os.getlogin())

As there will 99% of the time be less than 100 rows being added and deleted for any of the say 10 users at a time this will be efficient enough to handle the work.

This solutions is far more efficient then my current UNION method and will also allow each user to only see and work with data related to their login.

Seeing that each table will likely never exceed a total of 10,000 rows I don't think this will be much of an issue even without having a primary key to work with.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.