Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch housing dataset to wine #17

Closed
wants to merge 3 commits into from
Closed

switch housing dataset to wine #17

wants to merge 3 commits into from

Conversation

Ivanidzo4ka
Copy link
Contributor

This commit replaces housing dataset to wine dataset which we download during build from external source

@Ivanidzo4ka Ivanidzo4ka changed the title Ivanidze/switch housing dataset to wine switch housing dataset to wine May 4, 2018
@Ivanidzo4ka
Copy link
Contributor Author

Humble attempt to resolve #3 issue. Still not sure what to do with kc_housing since it's behind Kaggle authorization

@eerhardt eerhardt requested a review from codemzs May 7, 2018 23:39


<ItemGroup>
<TestFile Include="d" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Can you make the "Include" a better name?

<TestFile Include="winequality" Url="[https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv](https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv)"/>

@@ -738,6 +738,7 @@ public void EntryPointTextToKeyToText()
'Name': 'Data.TextLoader',
'Inputs': {{
'InputFile': '$file'
{8}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this already existed before you change, but what do you think about using string interpolation here? I think it would make this so much more readable:

'Inputs': {{
                        'InputFile': '$file'
                        {string.IsNullOrWhiteSpace(loader) ? "" : string.Format(",'CustomSchema': '{0}'", loader)}

@@ -855,15 +857,16 @@ public void EntryPointEvaluateMultiClass()
Assert.Equal(3, CountRows(loader));
}

[Fact(Skip = "Missing data set. See https://github.com/dotnet/machinelearning/issues/3")]
[Fact()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) no need for the parens. You can remove ().

<TestFile Include="d" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"/>
</ItemGroup>

<Target Name="DownloadExternalTestFiles" Condition="'$(RunTests)'=='true'">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always do this, not just when RunTests is on. That way I can still run tests from VS Test Explorer even if I just "build.cmd" from command line.

However, we should also have "Inputs and Outputs" hooked up, so that way the target is skipped if the files already exist. check out https://msdn.microsoft.com/en-us/library/ms171483.aspx for how to make incremental builds work. Let me know if you need help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you can help me with that?

     <TestFile Include="winequality-white.csv" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv">
         <Result>$(MSBuildThisFileDirectory)test\data\external\winequality-white.csv</Result>
     </TestFile>
      <TestFile Include="winequality-red.csv" Url="https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv">
         <Result>$(MSBuildThisFileDirectory)test\data\external\winequality-red.csv</Result>
     </TestFile>
  </ItemGroup>
  
  <Target Name="DownloadExternalTestFiles" Inputs="@(TestFile)" Outputs="%(TestFile.Result)">
    <Message Importance="High" Text="Downloading external test files... %(TestFile.Result)" />
      <DownloadFilesFromUrl Items="@(TestFile)"
                           DestinationDir="test/data/external"
                           TreatErrorsAsWarnings="true"/>
  </Target>

It keeps download files and i'm not sure how to debug it properly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eerhardt Can you help me?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way Inputs and Outputs work is based on file timestamps. You need an actual Input file, so MSBuild can compare its last write time to the output file's last write time.

Here when you specify:

Inputs="@(TestFile)"

It uses the TestFile's ItemSpec (what is put into the Include attribute). I'm assuming we don't have files named "winequality-white.csv", so MSBuild probably decides to always do the task, since it can't find the input file.

Another option is to check if the local file already exists, and if it does, then don't do the download.
Here's that approach we do in a different repo that is doing a similar thing (downloading files from the internet during the build, and caching them so they aren't downloaded for every build invocation):

https://github.com/dotnet/cli/blob/69062dda4edd7670cffa5c23876b38eccc130e09/build/Prepare.targets#L36-L56

The above will download any URL in the @(_DownloadAndExtractItem) Item. And then where those items are defined:

https://github.com/dotnet/cli/blob/69062dda4edd7670cffa5c23876b38eccc130e09/build/BundledRuntimes.props#L61-L66

The item is "Condition"d on whether the file exists or not:

    <_DownloadAndExtractItem Include="CombinedSharedHostAndFrameworkArchive"
                             Condition="!Exists('$(CombinedSharedHostAndFrameworkArchive)')">
      <Url>$(CoreSetupRootUrl)$(MicrosoftNETCoreAppPackageVersion)/$(CombinedFrameworkHostCompressedFileName)$(CoreSetupBlobAccessTokenParam)</Url>
      <DownloadFileName>$(CombinedSharedHostAndFrameworkArchive)</DownloadFileName>
      <ExtractDestination>$(SharedFrameworkPublishDirectory)</ExtractDestination>
    </_DownloadAndExtractItem>

So if the CombinedSharedHostAndFrameworkArchive file already exists, then this item isn't included. Thus, when all the files already exist locally, nothing is downloaded.

@@ -33,6 +33,7 @@
RestoreProjects;
BuildNative;
$(TraversalBuildDependsOn);
DownloadExternalTestFiles;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this thing download a file on every build or only if the file is not present already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to make it download file only once, but struggle for right now, see Eric comment and my reply.


<Target Name="DownloadExternalTestFiles" Condition="'$(RunTests)'=='true'">
<Message Importance="High" Text="Downloading external test files..." />
<DownloadFilesFromUrl Items="@(TestFile)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reasonable way to check the file hash? At some point the UCI site will be broken, and/or we will receive (perhaps) a partial file.

As we expand on the use of this, there'll be additional ways for it to fail.

@Ivanidzo4ka
Copy link
Contributor Author

I guess I delete my repository and recreate it again, and now PR lost track of origin source, so I need to create new PR #170

@ghost ghost locked as resolved and limited conversation to collaborators Mar 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants