What is SQLAlchemy
SQLAlchemy is a Python toolkit that allows developer to use the full range and power of SQL within Python. SQLAlchemy can be really useful when interacting with SQL databases from a within Python Applications and has been used in various projects within Aiimi. The official docs can be found here: https://www.sqlalchemy.org/
Installation
SQLAlchemy can be installed using pip:
pip install sqlalchemy
Basic SQLAlchemy Functions
Importing SQLAlchemy
SQLAlcemy is generally imported using sa:
import sqlalchemy as sa
Connecting To Your SQL Database
To connect to your SQL database you will need your databases connection string, these are different for each database but should look something like the following:
"mssql+pyodbc://@SERVER_NAME/DATABASE?trusted_connection=yes&driver=ODBC+Driver+17+for+SQL+Server"
Once you have you connection string we can use the create_engine function to connect to the database:
engine = sa.create_engine(url=connection_string)
Running Queries
There are a few different ways to run queries in SQLAlchemy. For reading tables, you can either use the SQLAlchemy functions or you can write them as normal SQL within a string.
To use the functions you would write a query like this to select * from you table:
query = table_name.select()
You would then use the execute command to run the query and fetchall to get the results:
conn = engine.connect()
exe = conn.execute(query)
result = exe.fetchall()
If you want to run the code using SQL strings then you can write a query using the text function:
query = sa.text("SELECT * FROM my_table")
Then use the same code as above to execute that query.
Using Pandas
If you are using SQLAlchemy then chances are Pandas will be a good way to manipulate your tables.
To read a table using pandas you can use:
df = pd.read_sql(query=query, con=conn)
Where conn and query are the same as earlier.
Then you can write tables by:
df.to_sql(table_name="MY_TABLE",con=conn)
Optimisations
Updating Engine Settings
SQLAlchemy has a number of settings that can be updated to improve query performance. Some of these settings can be set directly in the create_engine function but some require the use of ````execution_options``` function to be used.
fast_executemany
By default, SQLAlchemy writes tables 1 row at a time, for obvious reasons, this is very inefficient. To allow more than one row at a time we can update the fast_executemany setting to be true.
engine = sa.create_engine(url="connection_string", fast_executemany=True)
Now when using Pandas to_sql function we can define the chunksize and that number of rows will be written at once. The chunksize should be dependent on the table being written so play around with the chunksize to find the most efficient for your table.
df.to_sql(table_name="my_table", con=my_engine, chunksize=1000)
echo
The echo setting in SQLAlchemy serves as a print out of whatever SQL function is being ran. If lots of calls are being made through SQLAlchemy then the terminal will print out all of the queries which can slow the outputs down. If it is not necessary to see the queries being ran, it is recommended to turn echo to false:
engine = sa.create_engine(url="connection_string", echo=False)
stream_results
Reading large tables that do not fit entirely in memory can slow SQLAlchemy down, so to make it more efficient you can change the stream_results setting to true to handle manageable chuncks of data. This setting cannot be set to true on engine creation but can be updated using the execution_options function:
engine = sa.create_engine(url="connection_string")
with engine.connect() as conn:
result = conn.execture_option(stream_results=True).execute(sa.text("SELECT * FROM my_table"))