Bug 147176 - Double spaces stripped on import to Base
Summary: Double spaces stripped on import to Base
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Base (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL: https://ask.libreoffice.org/t/double-...
Whiteboard:
Keywords: difficultyMedium, easyHack, skillCpp
Depends on:
Blocks:
 
Reported: 2022-02-04 10:33 UTC by Colin Shearer
Modified: 2023-03-23 14:59 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Colin Shearer 2022-02-04 10:33:14 UTC
Description:
I’m importing data into Base via the following steps:

1. Export CSV file from another application
2. Open CSV file in Calc
3. Select contents in Calc, and Copy
4. Paste into a table in Base

It works fine, but  strings which originally had double spaces in them end up having only a single space when stored in Base. (This is more than cosmetic in my application, as these strings represent file names - so file lookups subsequently fail.) The double spaces are still present when the data is viewed in Calc, so appear to be being lost on the import to Base.

Steps to Reproduce:
1.Create record in Calc corresponding to a table schema in Base
2.In a string field, include a string with double spaces e.g. "a  string"
3.Select and copy record, select table in Base, paste
4.Examine record in table

Actual Results:
"a string"

Expected Results:
"a  string"


Reproducible: Always


User Profile Reset: No



Additional Info:
Version: 7.1.8.1 (x64) / LibreOffice Community
Build ID: e1f30c802c3269a1d052614453f260e49458c82c
CPU threads: 8; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: en-GB (en_GB); UI: en-US
Calc: threaded

Apparently the problem persists in later versions
Comment 1 Mike Kaganski 2022-02-04 15:51:38 UTC
> 4. Paste into a table in Base

Namely, select a table in Base's Tables, and paste over it, to get the Copy Table dialog, right?

If so, then it's because such paste uses HTML clipboard format for transferring data; HTML merges consecutive spaces.

You may, instead, right-click the table name, and use Paste Special, choosing Rich Text Format. Then all the spaces will be transferred.

In my testing, however, it needs to correct the destination table name (it tries to create a new table by default), and it can't paste data without column names.

Is my guess about your workflow correct?
Comment 2 Mike Kaganski 2022-02-04 21:27:59 UTC
FTR: the precedence of the formats in the dialog is defined in OTableCopyHelper::pasteTable. It is unchanged since
https://git.libreoffice.org/core/+/4494db00d08efead68bfbd753270944329221925

author	Kurt Zenker <kz@openoffice.org>	Fri Jan 21 16:19:46 2005 +0000	

INTEGRATION: CWS dba22 (1.1.2); FILE ADDED
2005/01/10 08:36:51 oj 1.1.2.2: compile error
2005/01/03 12:50:04 oj 1.1.2.1: #i39146# renable DnD in beamer

It is a question if HTML filter provides better results than RTF. May be it makes sense to change the precedence, which would be an easy hack.
Comment 3 Colin Shearer 2022-02-04 21:50:19 UTC
Yes, you're correct about my workflow.

One other thing: 

I’ve now gone through the “New database” process to create one whose tables were based on my CSV files. I then opened one of these and manually copied then pasted the data into the relevant table in my main database. It seems to work ok and at a quick check appears to have preserved the double spaces.

So I followed the same copy-paste procedure for Calc->Base (which stripped the double space) and Base->Base (which didn't). That would imply Base->Base pastes don't use the HTML clipboard format - is that correct?
Comment 4 Mike Kaganski 2022-02-04 22:03:46 UTC
(In reply to Colin Shearer from comment #3)
That would imply Base->Base
> pastes don't use the HTML clipboard format - is that correct?

Yes, as the code [1] shows, base filter is the first to use, before HTML.

[1] https://opengrok.libreoffice.org/xref/core/dbaccess/source/ui/misc/TableCopyHelper.cxx?r=7183b3ba#210
Comment 5 Mike Kaganski 2022-02-05 08:52:12 UTC
So this easyhack needs:
1. Changing the order of the filters (testing RTF before HTML), code pointer in comment 4;
2. Making RTF import use selected table name by default (code pointer - commit https://git.libreoffice.org/core/+/d736eef49512ee4623c7fe8d8b6fcb09669df7f8 "INTEGRATION: CWS dba24a (1.32.16); FILE MERGED");
3. Making RTF import allow pasting single rata row without column headers. Code pointer: in ORTFReader::NextToken, see case RTF_TROWD, and its handling of m_bAppendFirstLine. The problem there is that the SvParser (a parent class) enters a specific state (SvParserState::accepted), and its next character gets set to something, that is different from the values that are expected at the stream position that is being reset at that point; so the task is to restore these values (i.e., restore the parser state) together with the restoring of rInput position.

The end result should allow the following operation:

1. In a new Base database, create a table with two text (VARCHAR) columns (first set as primary key);
2. In a new Calc document, put "a b  c   d" into A1, and "e     f      g" into B1;
3. Select row 1 in the Calc document and copy to clipboard;
4. In the Base database, select the table in the Tables view, and press Ctrl+V to paste;
5. In the Copy Table dialog, make sure that the selected table name is automatically set in the Table name box, Append data is pre-selected (and Use first line as column names is checked);
6. Uncheck Use first line as column names;
7. Click Create button;
8. Check that the row is added to the table, and has the proper spaces.
Comment 6 Mike Kaganski 2022-02-05 09:13:16 UTC
(In reply to Mike Kaganski from comment #5)
> 3. Making RTF import allow pasting single rata row without column headers.
> Code pointer: ...

See also commit https://git.libreoffice.org/core/+/7408980c49bb025c6df35b844abff2e0cfb88c88

author	Ivo Hinkelmann <ihi@openoffice.org>	Wed Nov 21 15:06:36 2007 +0000
INTEGRATION: CWS dba24c (1.23.40); FILE MERGED
Comment 7 Hossein 2023-03-23 14:59:02 UTC
Re-evaluating the EasyHack in 2022

This EasyHack is still relevant. The problem is still reproducible with LO Dev 7.6 and instructions from comment 5:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: ed0372bac123b402fe3cd694a455e8328117752d
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: fa-IR (fa_IR); UI: en-US
Calc: threaded

I changed the difficulty level to medium, as we use the difficulty level beginner for trivial changes.