Ok, well I wouldn't expect that result even with pandas quirks
(assuming it's pandas). I'd be curious as to whether polars fixes your issue or, at the very least, throws some error that would give you some more indication of what's going on
Specifically, using the from_pandas method. I don't envy anyone maintaining that part of the API
@roganjosh Thanks for the hint. I might try making an MRE, but I don't think it's worth it. This isn't even the only problem I have with fillna on this DF, the other problem is that it doesn't replace NaT github.com/pandas-dev/pandas/issues/… Interestingly, one person experiences the opposite problem from me there (NaT getting replaced only with non-empty string), but I can't reproduce that either...
@xjcl That's why I suggested passing to polars because it will have to go through a far-more-strict type parser rather than just "object", which might elucidate what the pandas parser is doing. Creating an MRE would indeed probably be pretty difficult
I've become convinced that, in pandas, your column has a dtype that you can access with .dtype, and a secret second dtype that pandas is not telling you about to mess with you
.fillna doesn't just fill nans but also tries to automatically cast each column to the correct dtype (which pandas calls "downcasting"), which seems like a completely unrelated operation to me. At least they are now changing it (you need to specify "future.no_silent_downcasting")
polars isn't in our corporate Azure Artifact Feed so I can't "just" test it, but I would need to find out all the indirect dependencies and compatible versions, think of a reason, and then open a ticket to add these packages.
Was going through some code, why does a separate rest endpoint need to be created to retrieve/persist data in a database?
The code is FlaskAPI, MongoDB
of the top of my head a rest endpoint would allow for authentication/authorization, maybe based on request parameters get some data, maybe persist some data
but why not just do it directly on the DB?
What benefit does a rest service have before a Database is this something specific to NoSQL DB's? or is this done for RDBMS too?
I mean isnt there a chance that endpoint A makes 2 calls to DataEndpoint but due to some reason 2nd call arrives before 1st call and mess things up?
Why add another endpoint and increase latency/access to the database?
I struggle to read it in any sense other than frontend -> backend and wondering why flask is the middleman, but I could be wrong. We wait with baited breath
@roganjosh, @Peilnorayz apologies for the delay in response
well the app is wierd in that it has lots of django rest code
which do some stuff and for database writes/reads alone it is done via this flask code which has 3-4 instances running, As for MongoDB I am not sure how scaled it is
Niginx -> django rest if db get or post -> django rest API calls flask API in turn
Some tasks are async as in the API calls and does not wait for a response, some cases it does wait front end is angular and I dont have access to it just have the code we share a OpenAPI spec of what the input output looks like UI does what they do based on that
seems like its not something related to python more like some design choice
At the past two companies, I've been at, REST API's exist for querying data via webapp - i.e. instead of having the webapp do SQL directly it calls a REST API and that does the SQL and returns the result.
My question is - why is this done?
If it was going to be exposed to third-parties I could ...
the I dont get it why not just use django with both why flask and django, I guess team choice I hope its not like flask is faster than django
@Peilonrayz Yes for db calls django rest calls flask API which gets the data from DB
Interesting seems like as a consumer I get what I ask but how it stored and how effectively it is retrieved from the database are all the DB API's responsibility
Unfortunately answering the question is a lot more complicated. With frontend -> backend the obvious answer is security (plaintext password on client, client has access to your DB, etc). With backend -> backend the rational can become much more murky. For example does one team own the Django solution and another the flask solution? Do the creators not trust the people working on Django to interact with the DB? Does flask have some logging which is needed and is easier to wrap up in a REST endpoint? If I were you I'd ask the implementer(s)
@AAB If you're using synchronous requests then the 1st will block until complete, then the 2nd request will run. With async you can initialise both requests at the same time and the order isn't guaranteed to be stable. The simple approach is if you need one after the other you should use sync, if the timing doesn't matter async.
@AAB it depends also on scale. If the django app is for, like, 10 people calling into a flask app that is beefed up to the max to handle millions of requests a minute, then it would make sense
If the flask app only serves the django app, and nothing else, then I don't really understand the setup
I could definitely see a situation in which the django app is calling into something that is far more of a central resource across other projects, not just this django app. That would be provisioned with more hardware and extra resources in many different ways
Also, you mention latency a lot, but it would depend on the infra behind the two apps. Communication between two AWS services, for example, is trivial. It's yamming fast. There is definitely latency, but on a scale you probably couldn't measure from noise