« first day (5011 days earlier)      last day (4 days later) » 

8:08 AM
A day off in lieu for some silly hours recently. Finally my brain should be able to get back to functioning again and not farting about .isclose() :P
 
@roganjosh The .fillna issue was just with pandas and printing the result to console. The openpyxl issue is independent of that
 
Ok, well I wouldn't expect that result even with pandas quirks
(assuming it's pandas). I'd be curious as to whether polars fixes your issue or, at the very least, throws some error that would give you some more indication of what's going on
Specifically, using the from_pandas method. I don't envy anyone maintaining that part of the API
 
 
2 hours later…
10:27 AM
Is it possible to make a reference cycle so complicated that the GC doesn't catch it? Because I can't figure out why my objects aren't dying
 
I don't think so. There's just "cycles" and either the GC can detect them or not. I'm not aware of any special-casing for small/trivial cycles.
The usual culprit are accidentally global references, such as caches living on classes.
I've also occasionally had background tasks/coroutines keep things alive that I didn't consider.
You could try using gc.get_referrers just before you think you drop the cycle.
 
@roganjosh Thanks for the hint. I might try making an MRE, but I don't think it's worth it. This isn't even the only problem I have with fillna on this DF, the other problem is that it doesn't replace NaT github.com/pandas-dev/pandas/issues/… Interestingly, one person experiences the opposite problem from me there (NaT getting replaced only with non-empty string), but I can't reproduce that either...
 
Welp, turns out it's a lot easier than I expected
import weakref

class Object:
    pass

weak_key_dict = weakref.WeakKeyDictionary()
foo = Object()
bar = Object()

weak_key_dict[foo] = bar
weak_key_dict[bar] = foo

import gc
del foo, bar
gc.collect()
print(len(weak_key_dict))  # 2
So far this project has made me discover bugs in Firefox and in python
 
@xjcl That's why I suggested passing to polars because it will have to go through a far-more-strict type parser rather than just "object", which might elucidate what the pandas parser is doing. Creating an MRE would indeed probably be pretty difficult
 
@Aran-Fey I'm decently sure that WeakKeyDictionary is supposed to keep the values alive.
 
10:37 AM
But only as long as the keys are alive?
 
I've become convinced that, in pandas, your column has a dtype that you can access with .dtype, and a secret second dtype that pandas is not telling you about to mess with you
 
I guess it's a bit of a grey area
 
They are alive, since there are strong references to them. 🙃
It's certainly more on the technically and practical correct side of things.
PEP 2475: WeakKeyValueDictionary
Not sure how much work that would be, but it's probably feasible.
 
Could I theoretically write a custom WeakKeyDictionary that behaves as expected in this case?
 
🎱 Signs point to making KeyedRef's key a weakref as well
 
10:45 AM
I'm unsure if this requires special handling in the GC
 
I doubt it. Weakrefs can already handle cycles, and both WeakKeyDictionary and WeakValueDictionary are almost literally dicts with weakref key/values.
This actually looks suitable for an SO question, by the way... 🤔
 
@xjcl I wouldn't doubt that for one minute
There is presumably some metadata buried somewhere to know a string from a list at the very least. It can't keep taking inference passes at the data
Then again, that might be deferred to numpy but now, potentially, arrow
 
11:08 AM
.fillna doesn't just fill nans but also tries to automatically cast each column to the correct dtype (which pandas calls "downcasting"), which seems like a completely unrelated operation to me. At least they are now changing it (you need to specify "future.no_silent_downcasting")
 
@MisterMiyagi There already is one: stackoverflow.com/questions/6210024/… (No satisfying answers tho)
 
DenverLawyer9 strikes again. :/
 
@xjcl this is what polars addresses. It will not do implicit type casts
The only thing I'm aware of is an upcast of int to float
import polars as pl

df = pl.DataFrame({'a': [1, 2, 3], 'b': [4., 5., 6.]})
df = df.with_columns(
    c = pl.col('a') * pl.col('b')
)
print(df.dtypes)
 
 
2 hours later…
1:08 PM
I haven't tested it yet, but I think I've figured out an ok workaround for my WeakKeyDict problem. And it is... to get rid of the dict
foo._child = bar
bar._child = foo
Problem solved. Probably.
 
1:52 PM
polars isn't in our corporate Azure Artifact Feed so I can't "just" test it, but I would need to find out all the indirect dependencies and compatible versions, think of a reason, and then open a ticket to add these packages.
 
2:26 PM
morning cabbages, folks
 
AAB
2:39 PM
cbg all
Was going through some code, why does a separate rest endpoint need to be created to retrieve/persist data in a database?
The code is FlaskAPI, MongoDB
of the top of my head a rest endpoint would allow for authentication/authorization, maybe based on request parameters get some data, maybe persist some data
but why not just do it directly on the DB?
What benefit does a rest service have before a Database is this something specific to NoSQL DB's? or is this done for RDBMS too?
I mean isnt there a chance that endpoint A makes 2 calls to DataEndpoint but due to some reason 2nd call arrives before 1st call and mess things up?
Why add another endpoint and increase latency/access to the database?
 
I feel like some context is missing. Is the server a typical webserver where JS -> REST -> DB or something else?
 
3:26 PM
@AAB because of malicious input?
You'll need to show at least some example of what you mean here
 
Ah no response ;_; I was hoping AAB would come back saying "I'm doing backend -> backend not frontend -> backend"
 
3:42 PM
I struggle to read it in any sense other than frontend -> backend and wondering why flask is the middleman, but I could be wrong. We wait with baited breath
 
Agreed
 
 
1 hour later…
AAB
5:00 PM
@roganjosh, @Peilnorayz apologies for the delay in response
well the app is wierd in that it has lots of django rest code
which do some stuff and for database writes/reads alone it is done via this flask code which has 3-4 instances running, As for MongoDB I am not sure how scaled it is
Niginx -> django rest if db get or post -> django rest API calls flask API in turn
Some tasks are async as in the API calls and does not wait for a response, some cases it does wait front end is angular and I dont have access to it just have the code we share a OpenAPI spec of what the input output looks like UI does what they do based on that
seems like its not something related to python more like some design choice
44
Q: Why do people do REST API's instead of DBAL's?

neubertAt the past two companies, I've been at, REST API's exist for querying data via webapp - i.e. instead of having the webapp do SQL directly it calls a REST API and that does the SQL and returns the result. My question is - why is this done? If it was going to be exposed to third-parties I could ...

 
So TL;DR Nginx -> Django -> flask -> DB?
 
AAB
the I dont get it why not just use django with both why flask and django, I guess team choice I hope its not like flask is faster than django
@Peilonrayz Yes for db calls django rest calls flask API which gets the data from DB
Interesting seems like as a consumer I get what I ask but how it stored and how effectively it is retrieved from the database are all the DB API's responsibility
 
Unfortunately answering the question is a lot more complicated. With frontend -> backend the obvious answer is security (plaintext password on client, client has access to your DB, etc). With backend -> backend the rational can become much more murky. For example does one team own the Django solution and another the flask solution? Do the creators not trust the people working on Django to interact with the DB?
Does flask have some logging which is needed and is easier to wrap up in a REST endpoint? If I were you I'd ask the implementer(s)
 
AAB
Data is a separate team, Database database schema is a separate one
Well we are different teams but same company
seems like efficient database access/retrieval not distributing the DataAccessLayer or write it in django again seem like the benefits
security to I suppose
 
5:19 PM
Seem like fair inferences
 
AAB
hmm would be fun if the team writing the queries/creating the schema make a crappy one and its just another microservice that increases latency :P
btw is it possible that say a django end point sends 2 http posts but 1 arrives later than the other due to some reason?
is it even possible
 
6:05 PM
@AAB If you're using synchronous requests then the 1st will block until complete, then the 2nd request will run. With async you can initialise both requests at the same time and the order isn't guaranteed to be stable. The simple approach is if you need one after the other you should use sync, if the timing doesn't matter async.
 
6:27 PM
@AAB it depends also on scale. If the django app is for, like, 10 people calling into a flask app that is beefed up to the max to handle millions of requests a minute, then it would make sense
If the flask app only serves the django app, and nothing else, then I don't really understand the setup
I could definitely see a situation in which the django app is calling into something that is far more of a central resource across other projects, not just this django app. That would be provisioned with more hardware and extra resources in many different ways
Also, you mention latency a lot, but it would depend on the infra behind the two apps. Communication between two AWS services, for example, is trivial. It's yamming fast. There is definitely latency, but on a scale you probably couldn't measure from noise
 
 
3 hours later…
AAB
9:16 PM
@roganjosh maybe other services call it too not sure on that
A lot of ML related stuff also makes use of flask is it faster or something?
 
9:49 PM
@AAB I've gotten my colleagues to use flask for ML as flask is simpler than Django.
 

« first day (5011 days earlier)      last day (4 days later) »