Python

8:08 AM

A day off in lieu for some silly hours recently. Finally my brain should be able to get back to functioning again and not farting about .isclose() :P

xjcl

@roganjosh The .fillna issue was just with pandas and printing the result to console. The openpyxl issue is independent of that

roganjosh

Ok, well I wouldn't expect that result even with pandas quirks

(assuming it's pandas). I'd be curious as to whether polars fixes your issue or, at the very least, throws some error that would give you some more indication of what's going on

Specifically, using the from_pandas method. I don't envy anyone maintaining that part of the API

Aran-Fey

10:27 AM

Is it possible to make a reference cycle so complicated that the GC doesn't catch it? Because I can't figure out why my objects aren't dying

MisterMiyagi

I don't think so. There's just "cycles" and either the GC can detect them or not. I'm not aware of any special-casing for small/trivial cycles.

The usual culprit are accidentally global references, such as caches living on classes.

I've also occasionally had background tasks/coroutines keep things alive that I didn't consider.

You could try using gc.get_referrers just before you think you drop the cycle.

xjcl

@roganjosh Thanks for the hint. I might try making an MRE, but I don't think it's worth it. This isn't even the only problem I have with fillna on this DF, the other problem is that it doesn't replace NaT github.com/pandas-dev/pandas/issues/… Interestingly, one person experiences the opposite problem from me there (NaT getting replaced only with non-empty string), but I can't reproduce that either...

Aran-Fey

Welp, turns out it's a lot easier than I expected

import weakref

class Object:
    pass

weak_key_dict = weakref.WeakKeyDictionary()
foo = Object()
bar = Object()

weak_key_dict[foo] = bar
weak_key_dict[bar] = foo

import gc
del foo, bar
gc.collect()
print(len(weak_key_dict))  # 2

So far this project has made me discover bugs in Firefox and in python

roganjosh

@xjcl That's why I suggested passing to polars because it will have to go through a far-more-strict type parser rather than just "object", which might elucidate what the pandas parser is doing. Creating an MRE would indeed probably be pretty difficult

MisterMiyagi

@Aran-Fey I'm decently sure that WeakKeyDictionary is supposed to keep the values alive.

Aran-Fey

10:37 AM

But only as long as the keys are alive?

xjcl

I've become convinced that, in pandas, your column has a dtype that you can access with .dtype, and a secret second dtype that pandas is not telling you about to mess with you

Aran-Fey

I guess it's a bit of a grey area

MisterMiyagi

They are alive, since there are strong references to them. 🙃

It's certainly more on the technically and practical correct side of things.

PEP 2475: WeakKeyValueDictionary

Not sure how much work that would be, but it's probably feasible.

Aran-Fey

Could I theoretically write a custom WeakKeyDictionary that behaves as expected in this case?

MisterMiyagi

🎱 Signs point to making KeyedRef's key a weakref as well

Aran-Fey

10:45 AM

I'm unsure if this requires special handling in the GC

MisterMiyagi

I doubt it. Weakrefs can already handle cycles, and both WeakKeyDictionary and WeakValueDictionary are almost literally dicts with weakref key/values.

This actually looks suitable for an SO question, by the way... 🤔

roganjosh

@xjcl I wouldn't doubt that for one minute

There is presumably some metadata buried somewhere to know a string from a list at the very least. It can't keep taking inference passes at the data

Then again, that might be deferred to numpy but now, potentially, arrow

xjcl

11:08 AM

.fillna doesn't just fill nans but also tries to automatically cast each column to the correct dtype (which pandas calls "downcasting"), which seems like a completely unrelated operation to me. At least they are now changing it (you need to specify "future.no_silent_downcasting")

Aran-Fey

@MisterMiyagi There already is one: stackoverflow.com/questions/6210024/… (No satisfying answers tho)

MisterMiyagi

DenverLawyer9 strikes again. :/

roganjosh

@xjcl this is what polars addresses. It will not do implicit type casts

The only thing I'm aware of is an upcast of int to float

import polars as pl

df = pl.DataFrame({'a': [1, 2, 3], 'b': [4., 5., 6.]})
df = df.with_columns(
    c = pl.col('a') * pl.col('b')
)
print(df.dtypes)

Aran-Fey

1:08 PM

I haven't tested it yet, but I think I've figured out an ok workaround for my WeakKeyDict problem. And it is... to get rid of the dict

foo._child = bar
bar._child = foo

Problem solved. Probably.

xjcl

1:52 PM

polars isn't in our corporate Azure Artifact Feed so I can't "just" test it, but I would need to find out all the indirect dependencies and compatible versions, think of a reason, and then open a ticket to add these packages.

inspectorG4dget

2:26 PM

morning cabbages, folks

AAB

2:39 PM

cbg all

Was going through some code, why does a separate rest endpoint need to be created to retrieve/persist data in a database?

The code is FlaskAPI, MongoDB

of the top of my head a rest endpoint would allow for authentication/authorization, maybe based on request parameters get some data, maybe persist some data

but why not just do it directly on the DB?

What benefit does a rest service have before a Database is this something specific to NoSQL DB's? or is this done for RDBMS too?

I mean isnt there a chance that endpoint A makes 2 calls to DataEndpoint but due to some reason 2nd call arrives before 1st call and mess things up?

Why add another endpoint and increase latency/access to the database?

Peilonrayz

I feel like some context is missing. Is the server a typical webserver where JS -> REST -> DB or something else?

roganjosh

3:26 PM

@AAB because of malicious input?

You'll need to show at least some example of what you mean here

Peilonrayz

Ah no response ;_; I was hoping AAB would come back saying "I'm doing backend -> backend not frontend -> backend"

roganjosh

3:42 PM

I struggle to read it in any sense other than frontend -> backend and wondering why flask is the middleman, but I could be wrong. We wait with baited breath

Peilonrayz

Agreed

AAB

5:00 PM

@roganjosh, @Peilnorayz apologies for the delay in response

well the app is wierd in that it has lots of django rest code

which do some stuff and for database writes/reads alone it is done via this flask code which has 3-4 instances running, As for MongoDB I am not sure how scaled it is

Niginx -> django rest if db get or post -> django rest API calls flask API in turn

Some tasks are async as in the API calls and does not wait for a response, some cases it does wait front end is angular and I dont have access to it just have the code we share a OpenAPI spec of what the input output looks like UI does what they do based on that

seems like its not something related to python more like some design choice

44

Q: Why do people do REST API's instead of DBAL's?

At the past two companies, I've been at, REST API's exist for querying data via webapp - i.e. instead of having the webapp do SQL directly it calls a REST API and that does the SQL and returns the result. My question is - why is this done? If it was going to be exposed to third-parties I could ...

Peilonrayz

So TL;DR Nginx -> Django -> flask -> DB?

AAB

the I dont get it why not just use django with both why flask and django, I guess team choice I hope its not like flask is faster than django

@Peilonrayz Yes for db calls django rest calls flask API which gets the data from DB

Interesting seems like as a consumer I get what I ask but how it stored and how effectively it is retrieved from the database are all the DB API's responsibility

Peilonrayz

Unfortunately answering the question is a lot more complicated. With frontend -> backend the obvious answer is security (plaintext password on client, client has access to your DB, etc). With backend -> backend the rational can become much more murky. For example does one team own the Django solution and another the flask solution? Do the creators not trust the people working on Django to interact with the DB?
Does flask have some logging which is needed and is easier to wrap up in a REST endpoint? If I were you I'd ask the implementer(s)

AAB

Data is a separate team, Database database schema is a separate one

Well we are different teams but same company

seems like efficient database access/retrieval not distributing the DataAccessLayer or write it in django again seem like the benefits

security to I suppose

Peilonrayz

5:19 PM

Seem like fair inferences

AAB

hmm would be fun if the team writing the queries/creating the schema make a crappy one and its just another microservice that increases latency :P

btw is it possible that say a django end point sends 2 http posts but 1 arrives later than the other due to some reason?

is it even possible

Peilonrayz

6:05 PM

@AAB If you're using synchronous requests then the 1st will block until complete, then the 2nd request will run. With async you can initialise both requests at the same time and the order isn't guaranteed to be stable. The simple approach is if you need one after the other you should use sync, if the timing doesn't matter async.

roganjosh

6:27 PM

@AAB it depends also on scale. If the django app is for, like, 10 people calling into a flask app that is beefed up to the max to handle millions of requests a minute, then it would make sense

If the flask app only serves the django app, and nothing else, then I don't really understand the setup

I could definitely see a situation in which the django app is calling into something that is far more of a central resource across other projects, not just this django app. That would be provisioned with more hardware and extra resources in many different ways

Also, you mention latency a lot, but it would depend on the infra behind the two apps. Communication between two AWS services, for example, is trivial. It's yamming fast. There is definitely latency, but on a scale you probably couldn't measure from noise

AAB

9:16 PM

@roganjosh maybe other services call it too not sure on that

A lot of ML related stuff also makes use of flask is it faster or something?

Peilonrayz

9:49 PM

@AAB I've gotten my colleagues to use flask for ML as flask is simpler than Django.

Transcript for