2

I´m accessing a Microsoft SQL Server database with pyodbc in Python and I have many tables regarding states and years. I´m trying to create a pandas.DataFrame with all of them, but I don't know how to create a function and still create columns specifying YEAR and STATE for each of these states and years (I'm using NY2000 as an example). How should I build that function or "if loop"? Sorry for the lack of clarity, it's my first post here :/

tables = tuple([NY2000DX,NY2001DX,NY2002DX,AL2000DX,AL2001DX,AL2002DX,MA2000DX,MA2001DX,MA2002DX])
jobs = tuple([55,120])

query = """ SELECT
             ID,
             Job_ID,
             FROM {}
             WHERE Job_ID IN {}
            """.format(tables,jobs)

NY2000 = pd.read_sql(query,  server)

NY2000["State"] = NY
NY2000["Year"] = 2000

My desirable result would be a DF with the information from all tables with columns specifing State and Year. Like:

Year State ID Job_ID
2000 NY 13 55
2001 NY 20 55
2002 NY 25 55
2000 AL 15 120
2001 AL 60 120
2002 AL 45 120
------------ ------- -------- ----------

Thanks for the support :)

2
  • 3
    Aside, you should not be storing many prefixed and suffixed tables in relational databases. All those tables should be normalized into a single table with state and year indicators. Consider a database design if possible. Commented Jan 3, 2022 at 19:03
  • 1
    By SQL Database, do you mean SQL Server Database? If so or not, please tag your DBMS. FYI: No company including Microsoft owns the SQL name. Commented Jan 3, 2022 at 19:05

1 Answer 1

1

I agree with the comments about a normalised database and you haven't posted the table structures either. I'm assuming the only way to know year and state is by the table name, if so then you can do something along these lines:

df=pd.DataFrame({"Year":[],"State":[],"ID":[],"JOB_ID":[]})
tables = ["NY2000DX2","NY2001DX","NY2002DX","AL2000DX","AL2001DX","AL2002DX","MA2000DX","MA2001DX","MA2002DX"]
jobs = tuple([55,120])

def readtables(tablename, jobsincluded):
    query = """ SELECT
             {} YEAR,
             {} STATE,
             ID,
             Job_ID,
             FROM {}
             WHERE Job_ID IN {}
            """.format(tablename[2:6],tablename[:2],tablename,jobsincluded)
    return query

for table in tables:
    print(readtables(table,jobs))
    #dftable= pd.read_sql('readtables(table,jobs)', conn)
    #df=pd.concat[df,dftable]

please note that I commented out the actual table reading and concatenation into the final dataframe, as I don't actually have a connection to test. I just printed the resulting queries as a proof of concept.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.