How to create a function with SQL in Python and create columns?

Question

I´m accessing a Microsoft SQL Server database with pyodbc in Python and I have many tables regarding states and years. I´m trying to create a pandas.DataFrame with all of them, but I don't know how to create a function and still create columns specifying YEAR and STATE for each of these states and years (I'm using NY2000 as an example). How should I build that function or "if loop"? Sorry for the lack of clarity, it's my first post here :/

tables = tuple([NY2000DX,NY2001DX,NY2002DX,AL2000DX,AL2001DX,AL2002DX,MA2000DX,MA2001DX,MA2002DX])
jobs = tuple([55,120])

query = """ SELECT
             ID,
             Job_ID,
             FROM {}
             WHERE Job_ID IN {}
            """.format(tables,jobs)

NY2000 = pd.read_sql(query,  server)

NY2000["State"] = NY
NY2000["Year"] = 2000

My desirable result would be a DF with the information from all tables with columns specifing State and Year. Like:

Year	State	ID	Job_ID
2000	NY	13	55
2001	NY	20	55
2002	NY	25	55
2000	AL	15	120
2001	AL	60	120
2002	AL	45	120
------------	-------	--------	----------

Thanks for the support :)

Aside, you should not be storing many prefixed and suffixed tables in relational databases. All those tables should be normalized into a single table with state and year indicators. Consider a database design if possible. — Parfait
– Parfait, Commented Jan 3, 2022 at 19:03
By SQL Database, do you mean SQL Server Database? If so or not, please tag your DBMS. FYI: No company including Microsoft owns the SQL name. — Parfait
– Parfait, Commented Jan 3, 2022 at 19:05

Jayvee · Accepted Answer · 2022-01-03 19:27:35Z

I agree with the comments about a normalised database and you haven't posted the table structures either. I'm assuming the only way to know year and state is by the table name, if so then you can do something along these lines:

df=pd.DataFrame({"Year":[],"State":[],"ID":[],"JOB_ID":[]})
tables = ["NY2000DX2","NY2001DX","NY2002DX","AL2000DX","AL2001DX","AL2002DX","MA2000DX","MA2001DX","MA2002DX"]
jobs = tuple([55,120])

def readtables(tablename, jobsincluded):
    query = """ SELECT
             {} YEAR,
             {} STATE,
             ID,
             Job_ID,
             FROM {}
             WHERE Job_ID IN {}
            """.format(tablename[2:6],tablename[:2],tablename,jobsincluded)
    return query

for table in tables:
    print(readtables(table,jobs))
    #dftable= pd.read_sql('readtables(table,jobs)', conn)
    #df=pd.concat[df,dftable]

please note that I commented out the actual table reading and concatenation into the final dataframe, as I don't actually have a connection to test. I just printed the resulting queries as a proof of concept.

Collectives™ on Stack Overflow

How to create a function with SQL in Python and create columns?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related