Saving thousands(even millions) of records in a database through django ORM. Is it really a good idea?
Well, I am writing this note to be reminded later or to remind you who do this stupid mistake like i did. I have a project where i have written a command called init_data to initialize the required data in the system. I needed to create thousands of django user instances, thousands of other model instances.
So i was not concerned of how django orm will handle these instance creation in a loop until i suffered for it.
I have iterated a loop with 1000 range and created each user, relevant data that is needed for him. In my local machine it seems work fine but in the server when i ran the command my server machine got cpu usage 90%+ within a minute.
Then i stopped running that command and jumped to django doc to see what is happening inside django ORM.
So here’s what i learnt. Django ORM by default works as auto commit mode. So each time i created a record and called the save method it immediately hits the database to insert that record. Now imagine in a single iteration i had about 5 save methods called. So in each iteration it hitted the database 5 times. BULL SHIT!
Then i tried another method, bulk_create. Django has bulk create feature. I have tried bulk_create. It took only 0.65 seconds where the previous implementation took about 8.65 seconds. I wanted to skip bulk_create because it has some limitation. Like it doesn’t call the save method, doesn’t raise post_save, pre_save signals etc etc.
I wanted to let django hit the database less so i used commit_manually. These allowed me to save those records within 0.86 seconds which seems quiet good. And my CPU usage is normal.
So for these scenario i would suggest to use commit_manually. If you don’t need post_save, pre_save or save to be called then bulk_create would be better. Or even we have raw sql feature that is exposed to developer in django.