
import datetime
import os
from dateutil.relativedelta import *
from dateutil.rrule import *
from pathlib import Path
import ntpath

from dselib.dse_tools import dseCommon

"""
File_selector: tools for getting a list of files using a PathPattern and
range attribute. The range attribute can be ether an integer range or a
date range.

When working with environmental raster data, we are often working with a
collection of files where the parameter name and date is part of the file
name. These tools aid the user by creating a list of files that are
for parameters in a date range.

Environmental Data are often provided as raster images (e.g. satellite images)
by various data providers like NOAA, ECMWF, NCAR and NASA. These data
are grouped into sets (data sets), to serve various research disciplines.
For each data set, there is a set of parameters, data range and location,
and any given raster files is constructed from a subset of these data
partitions.

PathPatern and filtering files:

Typically when data is download, it is organized into a hierarchal directory
structure, and the file names follow a recommended naming convention, Path Pattern. This
directory structure and naming convention can be modeled as:
        1) root directory - The top level folder that contains  data sets
        2) relative directory - a sub directory that contains the specific
           specific data set
        3) and Pattern which describes the parameter and data organization
           and file naming conventions. The pattern can have one or more keys
           that can be used to substitute value for a set of files

For example, if a data set containing a temperature, humidity and rain fall estimate,
where each parameter (e.g. rainfall) is located in a
a dataset subdirectory (e.g 'rainfall'), and the file naming convention is
<param prefix> followed by the date in YYYY_mm_dd followed by '.nc'(e.g. rfe2010_01_15.nc),
we could describe a filter to select only rainfall files as as PathPattern:
    '{variable}/{prefix}%Y_%m_%d.nc'
Where "variable" is replace with 'rainfall', "prefix" is replace with 'rfe',
and '%Y_%m_%d' is the date formate (e.g. 2010_01_15 for date=2010/01/15).
FileSelector could then use the defined PathPattern to select all rainfall files for 2010 to 2015 using the user defined path pattern.

If the data set was partition by variable and year a PathPattern of
    '{variable}/%Y/{prefix}%Y_%m_%d.nc'
would be used.

In these examples the '{<key>}', 'variable' and 'prefix'

A similar scheme is used to generate output file names.

The class PathPattern is a container class for storing and manipulating
path patterns.

PathPattern currently does not support regular expressions

FileSelector:

Another common problem is the need to link multiple input files to an output file.
Again using the example of climate data. rainfall reading are often reported hourly, and are an accumulation of rain fall for the past hour. These reads are bundled into a a file for each day
with a time range of - to 23 hours. In order to compute the daily total rain fail, we need
hours 1 through 23 of <day> and the 0 hour of <day + 1>. Also we may want to aggregate data by month or quarter. We can think of this type of problem as 'vertical', in that we have a target, that requires some number of inputs that are identified by time.

We also have the problem of computing parameters from other parameters. For example Humidity is calculated using the air temperature and the dew point. In this case we need to construct the target horizontally, or by 'parameter'

The FileSelector, uses the input and output patterns, to construct a dictionary of targets, and their corresponding input files. FileSelector provides horizontal and vertical generation for a target.
"""

class PathPattern():
    """
    PathPatters:
    Initialization:
        Required inputs:
            root_dir     -> root directory for one or all data sets
            relative_dir -> the relative offset for the root_dir, can be None
            pattern      -> the file name pattern
    Methods:
        get_pattern_keys() - returns a list of keys defined in a PathPattern
        get_number_pattern_keys() - returns the number of keys defied in the PathPattern
        get_pattern_time_format(**kwargs) - returns the time format string for a set of kwargs
        get_path_pattern(*kwargs)

    Usage:
        pattern = '{variable}/%Y/{variable}%Y%j.nc'
        root_dir = 'c;/documents'
        relitive_dir = 'data/ERA5/daily'
        path_pattern = PathPattern(root_dir, pattern, relitive_dir)
        keys = path_pattern.get_pattern_keys() # -> returns 'variable'
        nkeys = path_pattern.get_number_pattern_keys() #-> returns 1
        time_format path_pattern_time_format(variable='airtemp') #-> returns 'airtemp%Y%j.nc'
        file_pattern = path_pattern.get_path_pattern(variable='airtemp') # returns <root_dir>/<relitive_dir>/airtemp/%Y/airtemp%Y%j.nc

    """

    def __init__(self, root_dir, pattern, relative_dir= None):
        if relative_dir:
            if relative_dir[0] == '/':
                rdir = relative_dir.strip("/")
            else:
                rdir = relative_dir
            self.root_path = dseCommon.norm_join(root_dir, rdir)
        else:
            self.root_path = str(Path(root_dir))
        self.pattern = pattern if pattern[0] != '/' else pattern.strip('/')
        # parse out the key words
        self.keys = PathPattern._get_key_words(pattern)

    @staticmethod
    def _get_key_words(pattern):
        # key words are identified by '{ }' pairs
        # pattern can have zero or more key words
        keys = []
        key_start_char = '{'
        key_end_char = '}'

        key_start = [pos for pos, char in enumerate(pattern) if char == key_start_char]
        key_end = [pos for pos, char in enumerate(pattern) if char == key_end_char]

        # length of key_start and key_end should be the same. If not through
        # exception
        if len(key_start) == len(key_end) and len(key_start) > 0:
            for i in range(len(key_start)):
                new_key = pattern[key_start[i] + 1: key_end[i]]
                if new_key not in keys:
                    keys.append(new_key)
        elif len(key_start) != len(key_end):
            raise KeyError("pattern key word values are not in { }")
        return keys

    def get_pattern_key_words(self):
        return self.keys

    def get_number_pattern_keys(self):
        return len(self.keys)

    def get_pattern_time_format(self, **kwargs):
        time_pattern = self.get_path_pattern(**kwargs)
        base_name = ntpath.basename(self.time_pattern)
        return base_name

    def get_path_pattern(self, **keys):
        # questions:
        # What if the user specifies no keys (e.g. len(keys) == 0)
        #       if len(self.keys) == 0 then Ok
        #       else throw exception
        # What if the user gives to many keys
        #       throw exception
        # what if the user does not give enough keys
        #       throw exception
        if len(self.keys) == 0:
            return self.pattern
        else:
            return(self.pattern.format(**keys))


class FileSelector():
    """
    Class FileSelector - used to build a dictionary of output files with a list of required input file(s) needed.

    Inputs:
        target_pattern = pattern used to generate target file names
        input_pattern = pattern used to generate input file names

    Methods:
        set_target_path_pattern(target_pattern)
        set_input_path_pattern(input_pattern)
        add_target_set(name,**kwargs)
        add_input_set_to_target(target_name,**kwargs)
        target_set_dict = get_target_sets()
        input_set_dict = get_input_sets()
        target_file_dict = get_target_file(target_name)
        file_dict = get_daily_sequence_files(start_date, end_date, days_per_target)
        file_dict get_monthly_sequence_files(start_date, end_date, number_extra_days=0)
        file_dict =get_quarterly_sequence_files(start_date, end_date, number_extra_days=0)
        missing_files_dict = check_target_input_files(file_dictionary)
    """

    def __init__(self,  target_pattern = None, input_pattern = None):
        self.target_pattern = None
        self.input_pattern = None
        if target_pattern :
            self.target_pattern = target_pattern
        if input_pattern :
            self.input_pattern = input_pattern
        self.target_sets = {}
        self.input_sets = {}
        self.target_files = {}

    def _generate_date_input_file_list(self, dates_generated ):
        files = {}
        # for a each target pattern, generate a list of input files
        # for each target

        for target_name, target_pattern in self.target_sets.items():
            in_patterns = self.input_sets[target_name]
            for k,v in dates_generated:
                target = k.strftime(target_pattern)
                self.target_files[target] = dseCommon.norm_join(self.target_pattern.root_path, target)
                day_files = []
                for i in range(v):
                    for in_pattern in in_patterns:
                        day = k + datetime.timedelta(days=i)
                        day_files.append(day.strftime(in_pattern))
                files[target] = [dseCommon.norm_join(self.input_pattern.root_path, p) for p in day_files]
        return files

    def set_target_path_pattern(self, target_pattern):
        if  target_pattern.__class__.__name__ == "PathPattern":
            self.target_pattern = target_pattern
        else:
            raise RuntimeError("set_target_path_pattern -> target_pattern not a PathPattern ")

    def set_input_path_pattern(self, input_pattern):
        if  input_pattern.__class__.__name__ == "PathPattern":
            self.input_pattern = input_pattern
        else:
            raise RuntimeError("set_input_path_pattern -> input_pattern not a PathPattern ")

    def add_target_set(self, set_name, **kwargs):
        # generate a target date formating string named set_name
        # and add it to the target_sets dictionary
        #   check to make sure set_name is not none and not empty or blanks
        if self.target_pattern is  None:
            raise AppError("add_target_set -> target_pattern not set. Use set_target_path_pattern() before calling add_target_set()")
        try:
            self.target_sets[set_name] = self.target_pattern.get_path_pattern(**kwargs)
        except KeyError:
            raise RuntimeError('values passed in pattern {}, does not match values passed in kwargs {} '.format(self.target_pattern.pattern, **kwargs )) from KeyError

    def add_input_set_to_target(self, target_set_name, **kwargs):
        # generate a target date formating string named set_name
        # and add it to the target_sets dictionary
        # Adding multiple input to a target provides horizontal bindings
        #   check to make sure set_name is in target_set
        if self.input_pattern is None:
            raise AppError("add_input_set_to_target -> input_pattern not set. Use set_input_path_pattern() before calling add_input_set_to_target()")
        try:
            target = self.input_pattern.get_path_pattern(**kwargs)
            if target_set_name in self.input_sets:
                l = self.input_sets[target_set_name]
                l.append(target)
                self.input_sets[target_set_name] = l
            else:
                self.input_sets[target_set_name] = [target]
        except KeyError:
            raise RuntimeError('add_input_set_to_target -> values passed in pattern {}, does not match values passed in kwargs {} '.format(self.input_pattern.pattern, **kwargs )) from KeyError


    def get_target_sets(self):
        return self.target_sets

    def get_input_sets(self):
        return self.input_sets

    def get_target_file(self,target):
        if target in self.target_files:
            return self.target_files[target]
        else:
            return None


    def get_sequence_files(self, start_date, end_date,freq, num_extra_days=0, group_size=1 ):
        files = None
        dates_generated = []
        dates = None
        step = 0
        if freq in ['D', 'd']:
            step = 1
            dates = [dt for dt in rrule(DAILY, dtstart=start_date, until=end_date)]
            for date in dates[::step]:
                total_days = ((date + relativedelta(days=+step)) - date).days + num_extra_days
                dates_generated.append((date,total_days))
        elif freq in ['N', 'n']:
            # Group the days by group_size days, e.g group_size would give a 5 day total or average
            # note if interval is not evenly dividable by group_size, the last group will have
            # fewer days
            step = group_size
            dates = [dt for dt in rrule(DAILY, dtstart=start_date, until=end_date)]
            for date in dates[::step]:
                if (end_date - date).days <= group_size:
                    step = (end_date - date).days + 1
                total_days = ((date + relativedelta(days=+step)) - date).days + num_extra_days
                dates_generated.append((date,total_days))
        elif freq in ['W', 'w']:
            # get date set in weeks. Python assumes a week starts on Monday (0) and ends on
            # Sunday (6). Note that if the date not evenly dividable by 7, the first and last week
            # may be short.
            first_monday = start_date + datetime.timedelta(days=(7-start_date.weekday()) % 7)
            ndays2first_monday = (first_monday - start_date).days
            dates = [dt for dt in rrule(WEEKLY, dtstart=first_monday, until=end_date)]
            step = 1
            group_size = 7
            if ndays2first_monday > 0:
                dates_generated.append((start_date,ndays2first_monday))
            for date in dates[::]:
                total_days = ((date + relativedelta(weeks=+step)) - date).days
                if (end_date - date).days <= group_size:
                    total_days = (end_date - date).days + 1
                total_days = total_days + num_extra_days
                dates_generated.append((date,total_days))
        elif freq in ['M','m']:
            step = 1
            dates = [dt for dt in rrule(MONTHLY, dtstart=start_date, until=end_date)]
            for date in dates[::step]:
                total_days = ((date + relativedelta(months=+step)) - date).days + num_extra_days
                dates_generated.append((date,total_days))
        elif freq in ['Q', 'q']:
            step = 3
            dates = [dt for dt in rrule(MONTHLY, dtstart=start_date, until=end_date)]
            for date in dates[::step]:
                total_days = ((date + relativedelta(months=+step)) - date).days + num_extra_days
                dates_generated.append((date,total_days))
        else:
            raise ValueError('Invalid time sequence interval, must be "D" for daily, "M" for monthly or "Q", for quarterly' )

        files = self._generate_date_input_file_list( dates_generated)
        return files

    def check_target_input_files(self,files):
        # files is a dictionary indexed by targets
        # this routine processes the dictionary to see if the files exists
        # and returns a dictionary of targets and missing files
        # if no files are are missing, None is returned
        rtn = None
        missing_dict = {}
        for target_file, input_files in files.items():
            missing_files = []
            for input_file in input_files:
                try:
                    absolute_path =Path(input_file).resolve()
                except FileNotFoundError:
                    missing_files.append(input_file)
            if len(missing_files) > 0:
                missing_dict[target_file] = missing_files
        if len(missing_dict) > 0:
            rtn = missing_dict
        return rtn
