Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to extract format width and precision of SAS dataset #54

Closed
rd-thomas-uhren opened this issue Sep 12, 2019 · 2 comments
Closed

Comments

@rd-thomas-uhren
Copy link

I'm unable to extract the format specification; example attached.

I'm using the latest version 2.0.11.

class.zip

		FileInputStream in = null;
		
		try 
		{
			in = new FileInputStream("C:\\opt\\sas\\class.sas7bdat");
		} 
		catch (FileNotFoundException e) 
		{
			e.printStackTrace();
		}
		
		SasFileReader sasFileReader = new SasFileReaderImpl(in);
		
		List<Column> columns = sasFileReader.getColumns();

		for (Column c : columns) 
		{
			System.out.println(c.getName() + " | " + c.getFormat() + " | " + c.getFormat().getWidth() + " | " + c.getFormat().getPrecision());
		}

Resut:

Name | | 0 | 0
Sex | | 0 | 0
Age | | 0 | 0
Height | | 0 | 0
Weight | | 0 | 0

Properties from SAS EG:

image

@xantorohara
Copy link
Contributor

Another one dataset with format information: cookie.zip
image

xantorohara added a commit to xantorohara/parso that referenced this issue Dec 3, 2020
It looks like `SasFileParser.FormatAndLabelSubheader.processSubheader` and and `SasFileConstants`
files should use another offsets to calculate column format information correctly for x64 bitness.

Before a fix:
```java
long COLUMN_FORMAT_WIDTH_OFFSET = 8L;
long COLUMN_FORMAT_PRECISION_OFFSET = 10L;

subheaderOffset + COLUMN_FORMAT_WIDTH_OFFSET + intOrLongLength,
subheaderOffset + COLUMN_FORMAT_PRECISION_OFFSET + intOrLongLength,
```

For 32-bit:
COLUMN_FORMAT_WIDTH_OFFSET + intOrLongLength = 8 + 4 = 12 (correct)
COLUMN_FORMAT_PRECISION_OFFSET + intOrLongLength = 10 + 4 = 14 (correct)

For 64-bit:
COLUMN_FORMAT_WIDTH_OFFSET + intOrLongLength = 8 + 8 = 16 (not correct)
COLUMN_FORMAT_PRECISION_OFFSET + intOrLongLength = 10 + 8 = 18 (not correct)

After the fix:
```java
long COLUMN_FORMAT_WIDTH_OFFSET = 0L;
long COLUMN_FORMAT_PRECISION_OFFSET = 2L;

subheaderOffset + COLUMN_FORMAT_PRECISION_OFFSET + 3 * intOrLongLength,
subheaderOffset + COLUMN_FORMAT_TEXT_SUBHEADER_INDEX_OFFSET + 3 * intOrLongLength,
```

For 32-bit:
COLUMN_FORMAT_WIDTH_OFFSET + 3 * intOrLongLength = 0 + 3 * 4 = 12 (correct and the same as above)
COLUMN_FORMAT_PRECISION_OFFSET + 3 * intOrLongLength = 2 + 3 * 4 = 14 (correct and the same as above)

For 64-bit:
COLUMN_FORMAT_WIDTH_OFFSET + 3 *intOrLongLength = 0 + 3 * 8 = 24 (correct)
COLUMN_FORMAT_PRECISION_OFFSET + 3 * intOrLongLength = 2 + 3 * 8 = 26 (correct)

So, this new calculation gives exactly the same offsets for 32-bit files and also fixes it for 64-bit files.
printsev pushed a commit that referenced this issue Dec 3, 2020
It looks like `SasFileParser.FormatAndLabelSubheader.processSubheader` and and `SasFileConstants`
files should use another offsets to calculate column format information correctly for x64 bitness.

Before a fix:
```java
long COLUMN_FORMAT_WIDTH_OFFSET = 8L;
long COLUMN_FORMAT_PRECISION_OFFSET = 10L;

subheaderOffset + COLUMN_FORMAT_WIDTH_OFFSET + intOrLongLength,
subheaderOffset + COLUMN_FORMAT_PRECISION_OFFSET + intOrLongLength,
```

For 32-bit:
COLUMN_FORMAT_WIDTH_OFFSET + intOrLongLength = 8 + 4 = 12 (correct)
COLUMN_FORMAT_PRECISION_OFFSET + intOrLongLength = 10 + 4 = 14 (correct)

For 64-bit:
COLUMN_FORMAT_WIDTH_OFFSET + intOrLongLength = 8 + 8 = 16 (not correct)
COLUMN_FORMAT_PRECISION_OFFSET + intOrLongLength = 10 + 8 = 18 (not correct)

After the fix:
```java
long COLUMN_FORMAT_WIDTH_OFFSET = 0L;
long COLUMN_FORMAT_PRECISION_OFFSET = 2L;

subheaderOffset + COLUMN_FORMAT_PRECISION_OFFSET + 3 * intOrLongLength,
subheaderOffset + COLUMN_FORMAT_TEXT_SUBHEADER_INDEX_OFFSET + 3 * intOrLongLength,
```

For 32-bit:
COLUMN_FORMAT_WIDTH_OFFSET + 3 * intOrLongLength = 0 + 3 * 4 = 12 (correct and the same as above)
COLUMN_FORMAT_PRECISION_OFFSET + 3 * intOrLongLength = 2 + 3 * 4 = 14 (correct and the same as above)

For 64-bit:
COLUMN_FORMAT_WIDTH_OFFSET + 3 *intOrLongLength = 0 + 3 * 8 = 24 (correct)
COLUMN_FORMAT_PRECISION_OFFSET + 3 * intOrLongLength = 2 + 3 * 8 = 26 (correct)

So, this new calculation gives exactly the same offsets for 32-bit files and also fixes it for 64-bit files.
@printsev
Copy link
Contributor

printsev commented Dec 3, 2020

closed thanks to xantorohara

@printsev printsev closed this as completed Dec 3, 2020
xantorohara added a commit to xantorohara/parso that referenced this issue Dec 3, 2020
…t-fix

epam#54 Fixed offsets for Column format detection
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants