Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: add option to disable field type inferencing #351

Open
aborruso opened this issue May 25, 2020 · 6 comments
Open

Feature request: add option to disable field type inferencing #351

aborruso opened this issue May 25, 2020 · 6 comments

Comments

@aborruso
Copy link

Hi,
I have an input HTML table in which a field has values as 06583. If I convert this table to CSV, this value becomes 6583.0.

It would be great to have a cli option to completely disable field type inferencing, and to have all output values as text field.

Thank you

@turicas
Copy link
Owner

turicas commented May 25, 2020

By now, you can force the type for a given column to be rows.fields.TextField. Add force_types parameter to the import function: rows.import_from_html(..., force_types={"field_name": rows.fields.TextField}).

@turicas turicas closed this as completed May 25, 2020
@aborruso
Copy link
Author

Turicas thank you.

My request is related to command line use.
Is it possible to do it via cli?

Best regards

@turicas
Copy link
Owner

turicas commented May 26, 2020

Oh, sorry. Let's reopen it.

@turicas turicas reopened this May 26, 2020
@turicas
Copy link
Owner

turicas commented May 26, 2020

Regarding to CLI: unfortunately not all CLI commands support forcing types. Some of them (csv2sqlite, pgimport) support passing a schema file via --schema, but this way you need specify all field types.
But there's a hack you could do: convert CSV to SQLite forcing text (or integer or whatever type you want) and then back to CSV. Like this:

Let's say you have a file test.csv with the following contents:

a,b
06583,some text
123,more text

Create the file schema.csv with:

field_name,field_type
a,text
b,text

Then, execute the commands:

rows csv2sqlite --schemas=schema.csv test.csv test-temp.sqlite
rows sqlite2csv test-temp.sqlite test test2.csv
rm test-temp.sqlite

and use test2.csv.

@turicas
Copy link
Owner

turicas commented May 26, 2020

@aborruso if you think the field is being detected incorrectly (it was detected as float while you think it should be detected as int, for example), please open a new issue with data I can reproduce.

@aborruso
Copy link
Author

hi @turicas my input is this example HTML

<!DOCTYPE html>
<html>
<body>
<table id="results" border="0" class="regpub_dati c35">
		<tbody>
			<tr class="c28">
				<th class="c27">Beneficiario</th>
				<th class="c27">Comune</th>
				<th class="c27">CAP</th>
				<th class="c27">Provincia </th>
				<th class="c27">Importo</th>
			</tr>
			
			<tr>
				<td class="c31">RNDFNC60E16</td>
				<td class="c31">RIPACANDIDA</td>
				<td class="c31">85020</td>
				<td class="c31">POTENZA</td>
				<td class="c34">09269</td>
			</tr>
			
			<tr>
				<td class="c31">RNDFNC60E16</td>
				<td class="c31"></td>
				<td class="c31"></td>
				<td class="c31">POTENZA</td>
				<td class="c34">05269</td>
			</tr>
		</tbody>
		</table>
		</body>
</html>

If I run rows convert input.html output.csv I have

+--------------+-------------+-------+-----------+---------+
| beneficiario | comune      | cap   | provincia | importo |
+--------------+-------------+-------+-----------+---------+
| RNDFNC60E16  | RIPACANDIDA | 85020 | POTENZA   | 9269.0  |
| RNDFNC60E16  | -           | -     | POTENZA   | 5269.0  |
+--------------+-------------+-------+-----------+---------+

The last field is a string and it becomes a float.

I'm not able to create a PR and help you. But rows is really a great tool to convert almost everything to a usable table; my suggestion, my feature request is to add to the cli, an option like -I that disable every kind of field inferencing, that force all field to be text fields. Then it will be the user, after the first import, to apply the right casting.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants