1

I am working on a data processing task in an enterprise environment with Python3 installed on a client-side Windows Jump server. The data, which I need to download regularly from a third-party provider, is accessed via a REST API. The business requirement is that the data (body) received in response should be logged in a way that provides the most credible solution how data is received in its most original form from the third party provider. That is, it must be convincing that what is logged is what is received from them. So we need to develop a solution that eliminates the possibility that we on the receiving end have "manipulated" the data.

Can anyone advise on a bulletproof solution?

Solution 1) I use the Python requests package and make the following request to the API server:

import logging

def getdata(timeinterval):
    with requests.Session() as session:
        logger.debug("request started")
        parameters = {intervals:["Monthly"]}
        headers = ..I also have an authorization token that I get from OAUTH2.
        response = session.request("get",url,data=parameters,headers=headers)
        response.raise_for_status()
        api_logger.debug(f "request headers sent: {response.request.headers}")
        api_logger.debug(f "response headers received: {response.headers}")
        ds = response.json()
        api_logger.info(f "response body received: {response.content})
        return ds

If I use logging with the above parameters, I'm signing all files eletronically after closing the log file.

Solution 2) The network traffic is logged with our request and responses. The only problem is that it should be connected to the data processing code, somehow, am I right?

Solution 3) ?

Many thanks in advance!

5
  • " the most credible solution .... it must be convincing that what is logged is what is received from them" - to whom it must be convincing? Also, if this about preventing "manipulation" on your end it is not sufficient to prove the download, but you need have prove for all steps of your processing (download file, all applications used to process it, any environment which might affect processing, ...) since in every step there could be manipulation done. Commented Jun 21 at 17:00
  • Thank you for the quick reply, in fact the purpose is to have the information available to the third party data provider that what we downloaded from them was the state available at the time of download, we did not modify the data afterwards. This situation is a bit related to the so called non-repudiation.
    – gale44
    Commented Jun 25 at 19:03
  • If they want to have control over what you've downloaded then they should simply provide authenticated access and log your access + what version of the file they served. But again, just because you've downloaded the file does not prove that you actually used the unmodified file or the downloaded file at all further in your program. Commented Jun 25 at 20:17
  • Unfortunately, we, the data consumers, have to prove to the data provider what has been delivered to us. And that's because they might show us something else. Total lack of trust, and tricky partners :(
    – gale44
    Commented Jul 2 at 10:45
  • You cannot show them what has been delivered to you since you could simply make this up and claim it was delivered to you. So the server providing the data need to be somehow involved in thus prove, for example by signing the content. HTTPS does not provide this by its own. And again, nothing of this proves that this was the content you've used as input into further processing, it only proves that you had access to it. Commented Jul 2 at 12:44

0

You must log in to answer this question.

Browse other questions tagged .